git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 0/3] am/rebase: share read_author_script()
@ 2018-09-12 10:10 Phillip Wood
  2018-09-12 10:10 ` [PATCH 1/3] am: rename read_author_script() Phillip Wood
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Phillip Wood @ 2018-09-12 10:10 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Git Mailing List, Eric Sunshine, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (3):
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  61 ++------------------
 sequencer.c  | 160 ++++++++++++++++++++++++++++-----------------------
 sequencer.h  |   3 +
 3 files changed, 96 insertions(+), 128 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] am: rename read_author_script()
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
@ 2018-09-12 10:10 ` Phillip Wood
  2018-09-12 10:10 ` [PATCH 2/3] add read_author_script() to libgit Phillip Wood
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-09-12 10:10 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Git Mailing List, Eric Sunshine, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..8c165f747b 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -302,7 +302,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
 	struct strbuf buf = STRBUF_INIT;
@@ -411,7 +411,7 @@ static void am_load(struct am_state *state)
 		BUG("state file 'last' does not exist");
 	state->last = strtol(sb.buf, NULL, 10);
 
-	if (read_author_script(state) < 0)
+	if (read_am_author_script(state) < 0)
 		die(_("could not parse author script"));
 
 	read_commit_msg(state);
-- 
2.18.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] add read_author_script() to libgit
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
  2018-09-12 10:10 ` [PATCH 1/3] am: rename read_author_script() Phillip Wood
@ 2018-09-12 10:10 ` Phillip Wood
  2018-09-13 23:49   ` Eric Sunshine
  2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-09-12 10:10 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Git Mailing List, Eric Sunshine, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 57 ++++-------------------------------------------
 sequencer.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 sequencer.h  |  3 +++
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 8c165f747b..aa5de0ee73 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,32 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state,
 	die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append <KEY, VALUE> at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-	while (*buf) {
-		struct string_list_item *item;
-		char *np;
-		char *cp = strchr(buf, '=');
-		if (!cp)
-			return -1;
-		np = strchrnul(cp, '\n');
-		*cp++ = '\0';
-		item = string_list_append(list, buf);
-
-		buf = np + (*np == '\n');
-		*np = '\0';
-		cp = sq_dequote(cp);
-		if (!cp)
-			return -1;
-		item->util = xstrdup(cp);
-	}
-	return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
-	struct strbuf buf = STRBUF_INIT;
-	struct string_list kv = STRING_LIST_INIT_DUP;
-	int retval = -1; /* assume failure */
-	int fd;
 
 	assert(!state->author_name);
 	assert(!state->author_email);
 	assert(!state->author_date);
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		die_errno(_("could not open '%s' for reading"), filename);
-	}
-	strbuf_read(&buf, fd, 0);
-	close(fd);
-	if (parse_key_value_squoted(buf.buf, &kv))
-		goto finish;
+	if (read_author_script(filename, &state->author_name,
+			       &state->author_email, &state->author_date, 1))
+		exit(128);
 
-	if (kv.nr != 3 ||
-	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
-		goto finish;
-	state->author_name = kv.items[0].util;
-	state->author_email = kv.items[1].util;
-	state->author_date = kv.items[2].util;
-	retval = 0;
-finish:
-	string_list_clear(&kv, !!retval);
-	strbuf_release(&buf);
-	return retval;
+	return 0;
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..5d0ff8f1c1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,68 @@ static int write_author_script(const char *message)
 	return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append <KEY, VALUE> at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+	while (*buf) {
+		struct string_list_item *item;
+		char *np;
+		char *cp = strchr(buf, '=');
+		if (!cp)
+			return -1;
+		np = strchrnul(cp, '\n');
+		*cp++ = '\0';
+		item = string_list_append(list, buf);
+
+		buf = np + (*np == '\n');
+		*np = '\0';
+		cp = sq_dequote(cp);
+		if (!cp)
+			return -1;
+		item->util = xstrdup(cp);
+	}
+	return 0;
+}
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list kv = STRING_LIST_INIT_DUP;
+	int retval = -1;
+
+	if (strbuf_read_file(&buf, path, 256) <= 0) {
+		strbuf_release(&buf);
+		if (errno == ENOENT && allow_missing)
+			return 0;
+		else
+			return error_errno(_("could not open '%s' for reading"),
+					   path);
+	}
+
+	if (parse_key_value_squoted(buf.buf, &kv)) {
+		error(_("unable to parse '%s'"), path);
+		goto finish;
+	}
+	if (kv.nr != 3 ||
+	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
+	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
+	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE")) {
+		error(_("unable to parse '%s'"), path);
+		goto finish;
+	}
+	*name = kv.items[0].util;
+	*email = kv.items[1].util;
+	*date = kv.items[2].util;
+	retval = 0;
+finish:
+	string_list_clear(&kv, !!retval);
+	strbuf_release(&buf);
+	return retval;
+}
 
 /*
  * write_author_script() used to fail to terminate the last line with a "'" and
diff --git a/sequencer.h b/sequencer.h
index c751c9d6e4..3713f955f5 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -107,4 +107,7 @@ void commit_post_rewrite(const struct commit *current_head,
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(const char *prefix, const struct object_id *oid,
 			  unsigned int flags);
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing);
 #endif
-- 
2.18.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] sequencer: use read_author_script()
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
  2018-09-12 10:10 ` [PATCH 1/3] am: rename read_author_script() Phillip Wood
  2018-09-12 10:10 ` [PATCH 2/3] add read_author_script() to libgit Phillip Wood
@ 2018-09-12 10:10 ` Phillip Wood
  2018-09-12 12:14   ` Eric Sunshine
  2018-09-14  0:31   ` Eric Sunshine
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 41+ messages in thread
From: Phillip Wood @ 2018-09-12 10:10 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Git Mailing List, Eric Sunshine, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Use the new function to read the author script, updating
read_env_script() and read_author_ident(). This means we now have a
single code path that reads the author script and uses sq_dequote() to
dequote it. This fixes potential problems with user edited scripts
as read_env_script() which did not track quotes properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 108 +++++++++++++++-------------------------------------
 1 file changed, 30 insertions(+), 78 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 5d0ff8f1c1..630741cfe0 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, char **email, char **date,
 	return retval;
 }
 
-/*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-	/* Skip any empty lines in case the file was hand edited */
-	while (n > 0 && s[--n] == '\n')
-		; /* empty */
-	if (n > 0 && s[n] != '\'')
-		return 1;
-
-	return 0;
-}
-
 /*
  * Read a list of environment variable assignments (such as the author-script
  * file) into an environment block. Returns -1 on error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
 	struct strbuf script = STRBUF_INIT;
-	int i, count = 0, sq_bug;
-	const char *p2;
-	char *p;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return -1;
-	/* write_author_script() used to quote incorrectly */
-	sq_bug = quoting_is_broken(script.buf, script.len);
-	for (p = script.buf; *p; p++)
-		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (skip_prefix(p, "'\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (*p == '\'')
-			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
-		else if (*p == '\n') {
-			*p = '\0';
-			count++;
-		}
 
-	for (i = 0, p = script.buf; i < count; i++) {
-		argv_array_push(env, p);
-		p += strlen(p) + 1;
-	}
+	strbuf_addstr(&script, "GIT_AUTHOR_NAME=");
+	strbuf_addstr(&script, name);
+	argv_array_push(env, script.buf);
+	strbuf_reset(&script);
+	strbuf_addstr(&script, "GIT_AUTHOR_EMAIL=");
+	strbuf_addstr(&script, email);
+	argv_array_push(env, script.buf);
+	strbuf_reset(&script);
+	strbuf_addstr(&script, "GIT_AUTHOR_DATE=");
+	strbuf_addstr(&script, date);
+	argv_array_push(env, script.buf);
+	strbuf_release(&script);
+
+	free(name);
+	free(email);
+	free(date);
 
 	return 0;
 }
@@ -790,54 +771,25 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author <email> timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-	const char *keys[] = {
-		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-	};
-	struct strbuf out = STRBUF_INIT;
-	char *in, *eol;
-	const char *val[3];
-	int i = 0;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return NULL;
 
-	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
-		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning(_("could not parse '%s' (looking for '%s')"),
-				rebase_path_author_script(), keys[i]);
-			return NULL;
-		}
-
-		eol = strchrnul(in, '\n');
-		*eol = '\0';
-		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
-		}
-		val[i] = in;
-		in = eol + 1;
-	}
-
-	if (i < 3) {
-		warning(_("could not parse '%s' (looking for '%s')"),
-			rebase_path_author_script(), keys[i]);
-		return NULL;
-	}
-
 	/* validate date since fmt_ident() will die() on bad value */
-	if (parse_date(val[2], &out)){
+	if (parse_date(date, buf)){
 		warning(_("invalid date format '%s' in '%s'"),
-			val[2], rebase_path_author_script());
-		strbuf_release(&out);
+			date, rebase_path_author_script());
+		strbuf_release(buf);
 		return NULL;
 	}
 
-	strbuf_reset(&out);
-	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
-	strbuf_swap(buf, &out);
-	strbuf_release(&out);
+	strbuf_reset(buf);
+	strbuf_addstr(buf, fmt_ident(name, email, date, 0));
+	free(name);
+	free(email);
+	free(date);
 	return buf->buf;
 }
 
-- 
2.18.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] sequencer: use read_author_script()
  2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
@ 2018-09-12 12:14   ` Eric Sunshine
  2018-09-14  0:31   ` Eric Sunshine
  1 sibling, 0 replies; 41+ messages in thread
From: Eric Sunshine @ 2018-09-12 12:14 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Use the new function to read the author script, updating
> read_env_script() and read_author_ident(). This means we now have a
> single code path that reads the author script and uses sq_dequote() to
> dequote it. This fixes potential problems with user edited scripts
> as read_env_script() which did not track quotes properly.
> [...]
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, char **email, char **date,
>  static int read_env_script(struct argv_array *env)
>  {
> +       strbuf_addstr(&script, "GIT_AUTHOR_NAME=");
> +       strbuf_addstr(&script, name);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_EMAIL=");
> +       strbuf_addstr(&script, email);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_DATE=");
> +       strbuf_addstr(&script, date);
> +       argv_array_push(env, script.buf);
> +       strbuf_release(&script);

I haven't read this series closely yet, but this caught my eye while
scanning it quickly. The above noisy code can all be collapsed to the
simpler:

    argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
    argv_array_pushf(env, "GIT_AUTHOR_EMAIL =%s", email);
    argv_array_pushf(env, "GIT_AUTHOR_DATE =%s", date);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] add read_author_script() to libgit
  2018-09-12 10:10 ` [PATCH 2/3] add read_author_script() to libgit Phillip Wood
@ 2018-09-13 23:49   ` Eric Sunshine
  2018-10-10 10:14     ` Phillip Wood
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2018-09-13 23:49 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Add read_author_script() to sequencer.c based on the implementation in
> builtin/am.c and update read_am_author_script() to use
> read_author_script(). The sequencer code that reads the author script
> will be updated in the next commit.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/builtin/am.c b/builtin/am.c
> @@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
>  static int read_am_author_script(struct am_state *state)
>  {

This function returns 'int'...

>         const char *filename = am_path(state, "author-script");
> +       if (read_author_script(filename, &state->author_name,
> +                              &state->author_email, &state->author_date, 1))
> +               exit(128);

...but only ever exit()s...

> +       return 0;

... or returns 0.

>  }

It's a little disturbing that it now invokes exit() directly rather
than calling die() since die() may involve extra functionality before
actually exiting.

What if, instead of exit()ing directly, you drop the conditional and
instead return the value of read_author_script():

    return read_author_script(...);

and let the caller deal with the zero or non-zero return value as
usual? (True, you'd get two error messages upon failure rather than
just one, but that's not necessarily a bad thing.)

A possibly valid response is that such a change is outside the scope
of this series since the original code shared this odd architecture of
sometimes returning 0, sometimes -1, and sometimes die()ing.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] sequencer: use read_author_script()
  2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
  2018-09-12 12:14   ` Eric Sunshine
@ 2018-09-14  0:31   ` Eric Sunshine
  2018-10-10 10:35     ` Phillip Wood
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2018-09-14  0:31 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Use the new function to read the author script, updating
> read_env_script() and read_author_ident(). This means we now have a
> single code path that reads the author script and uses sq_dequote() to
> dequote it. This fixes potential problems with user edited scripts
> as read_env_script() which did not track quotes properly.
> [...]
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  /*
>   * Read a list of environment variable assignments (such as the author-script
>   * file) into an environment block. Returns -1 on error, 0 otherwise.
>   */

According to this comment, this function is capable of parsing a file
of arbitrary "NAME=Value" lines, and indeed the original code does
just that, but...

>  static int read_env_script(struct argv_array *env)
>  {
> +       char *name, *email, *date;
>
> -       if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
> +       if (read_author_script(rebase_path_author_script(),
> +                              &name, &email, &date, 0))

...the new implementation is able to handle only GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE, in exactly that order.

This seems like a pretty serious (and possibly buggy) change of
behavior, and makes the function much less useful (in general). Is it
true that it will only ever be used for files containing that limited
set of names? If so, the behavior change deserves mention in the
commit message, the function comment needs updating, and the function
itself probably ought to be renamed.

> +       strbuf_addstr(&script, "GIT_AUTHOR_NAME=");
> +       strbuf_addstr(&script, name);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_EMAIL=");
> +       strbuf_addstr(&script, email);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_DATE=");
> +       strbuf_addstr(&script, date);
> +       argv_array_push(env, script.buf);
> +       strbuf_release(&script);

Mentioned earlier[1], this can all collapse down to:

argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);

However, it's unfortunate that this manual and hard-coded
reconstruction is needed at all. If you restructure the factoring of
this patch series, such ugliness can be avoided altogether. For
instance, the series could be structured like this:

1. Introduce a general-purpose function for reading a file containing
arbitrary "NAME=Value" lines (not carrying about specific key names or
their order) and returning them in some data structure (perhaps via
'string_list' as parse_key_value_squoted() in patch 2/3 does).

2. Build read_author_script() atop #1, making it expect and extract
GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE (possibly in
that order, or possibly not if we don't care).

3. Retrofit existing parsers to call one of those two functions (this
step may happen over several patches). So, for instance,
read_env_script() would call the generic parser from #1, whereas
sequencer.c:read_author_ident() would call the more specific parser
from #2.

More below...

> @@ -790,54 +771,25 @@ static char *get_author(const char *message)
>  /* Read author-script and return an ident line (author <email> timestamp) */
>  static const char *read_author_ident(struct strbuf *buf)
>  {
> -       const char *keys[] = {
> -               "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
> -       };
> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> +       if (read_author_script(rebase_path_author_script(),
> +                              &name, &email, &date, 0))
>                 return NULL;
> -       /* dequote values and construct ident line in-place */
> -       for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
> -               if (!skip_prefix(in, keys[i], (const char **)&in)) {
> -                       warning(_("could not parse '%s' (looking for '%s')"),
> -                               rebase_path_author_script(), keys[i]);
> -                       return NULL;
> -               }
> -               if (!sq_dequote(in)) {
> -                       warning(_("bad quoting on %s value in '%s'"),
> -                               keys[i], rebase_path_author_script());
> -                       return NULL;
> -               }
> -       if (i < 3) {
> -               warning(_("could not parse '%s' (looking for '%s')"),
> -                       rebase_path_author_script(), keys[i]);
> -               return NULL;
> -       }

The parsing code being thrown away here does a better job of
diagnosing problems (thus helping the user figure out what went wrong)
than the new shared parser introduced by patch 2/3. The shared
function only ever reports a generic "unable to parse", whereas the
above code gets specific, saying that it was looking for a particular
key or that quoting was broken. I'd have expected the new shared
parser to encompass the best features of the existing parsers (such as
presenting better error messages).

>         /* validate date since fmt_ident() will die() on bad value */
> -       if (parse_date(val[2], &out)){
> +       if (parse_date(date, buf)){

Re-purposing the strbuf 'buf', which is passed into this function,
binds this function too tightly with its caller by assuming that the
caller will never need the original content of 'buf' anymore. Thus, it
would be better for this code continue using its own local strbuf
'out' rather than re-purposing the incoming 'buf'.

>                 warning(_("invalid date format '%s' in '%s'"),
> -                       val[2], rebase_path_author_script());
> -               strbuf_release(&out);
> +                       date, rebase_path_author_script());
> +               strbuf_release(buf);

Likewise, it's doubly odd to see this function releasing 'buf' which
it does not own.

>                 return NULL;
>         }

[1]: https://public-inbox.org/git/CAPig+cRvUr26GZyW6ecYhpwABueBqaEfZH1+JjLaqZo8+RTD6Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] add read_author_script() to libgit
  2018-09-13 23:49   ` Eric Sunshine
@ 2018-10-10 10:14     ` Phillip Wood
  2018-10-10 10:22       ` Eric Sunshine
  0 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-10-10 10:14 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

Hi Eric

Thanks for looking at this series, sorry it has taken so long for me to
reply.

On 14/09/2018 00:49, Eric Sunshine wrote:
> On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> Add read_author_script() to sequencer.c based on the implementation in
>> builtin/am.c and update read_am_author_script() to use
>> read_author_script(). The sequencer code that reads the author script
>> will be updated in the next commit.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>> diff --git a/builtin/am.c b/builtin/am.c
>> @@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
>>  static int read_am_author_script(struct am_state *state)
>>  {
> 
> This function returns 'int'...
> 
>>         const char *filename = am_path(state, "author-script");
>> +       if (read_author_script(filename, &state->author_name,
>> +                              &state->author_email, &state->author_date, 1))
>> +               exit(128);
> 
> ...but only ever exit()s...
> 
>> +       return 0;
> 
> ... or returns 0.
> 
>>  }
> 
> It's a little disturbing that it now invokes exit() directly rather
> than calling die() since die() may involve extra functionality before
> actually exiting.
> 
> What if, instead of exit()ing directly, you drop the conditional and
> instead return the value of read_author_script():
> 
>     return read_author_script(...);
> 
> and let the caller deal with the zero or non-zero return value as
> usual? (True, you'd get two error messages upon failure rather than
> just one, but that's not necessarily a bad thing.)
> 
> A possibly valid response is that such a change is outside the scope
> of this series since the original code shared this odd architecture of
> sometimes returning 0, sometimes -1, and sometimes die()ing.

My aim was to try and to keep the changes to a minimum as this patch
isn't about changing the odd architecture of the original. I could add a
follow up patch that cleans things up as you suggest.

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] add read_author_script() to libgit
  2018-10-10 10:14     ` Phillip Wood
@ 2018-10-10 10:22       ` Eric Sunshine
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Sunshine @ 2018-10-10 10:22 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

On Wed, Oct 10, 2018 at 6:14 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 14/09/2018 00:49, Eric Sunshine wrote:
> > What if, instead of exit()ing directly, you drop the conditional and
> > instead return the value of read_author_script():
> >
> >     return read_author_script(...);
> >
> > and let the caller deal with the zero or non-zero return value as
> > usual? (True, you'd get two error messages upon failure rather than
> > just one, but that's not necessarily a bad thing.)
> >
> > A possibly valid response is that such a change is outside the scope
> > of this series since the original code shared this odd architecture of
> > sometimes returning 0, sometimes -1, and sometimes die()ing.
>
> My aim was to try and to keep the changes to a minimum as this patch
> isn't about changing the odd architecture of the original. I could add a
> follow up patch that cleans things up as you suggest.

The code already had that weird mix of return(s) and die(), hence the
state is no worse after this patch than before. So, a cleanup patch
later in the series, a follow up series, or doing nothing at all are
all reasonable approaches. I don't insist on it for this patch series.
Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] sequencer: use read_author_script()
  2018-09-14  0:31   ` Eric Sunshine
@ 2018-10-10 10:35     ` Phillip Wood
  0 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-10 10:35 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Junio C Hamano, Johannes Schindelin, Git List

On 14/09/2018 01:31, Eric Sunshine wrote:
> On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> Use the new function to read the author script, updating
>> read_env_script() and read_author_ident(). This means we now have a
>> single code path that reads the author script and uses sq_dequote() to
>> dequote it. This fixes potential problems with user edited scripts
>> as read_env_script() which did not track quotes properly.
>> [...]
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>  /*
>>   * Read a list of environment variable assignments (such as the author-script
>>   * file) into an environment block. Returns -1 on error, 0 otherwise.
>>   */
> 
> According to this comment, this function is capable of parsing a file
> of arbitrary "NAME=Value" lines, and indeed the original code does
> just that, but...
> 
>>  static int read_env_script(struct argv_array *env)
>>  {
>> +       char *name, *email, *date;
>>
>> -       if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>> +       if (read_author_script(rebase_path_author_script(),
>> +                              &name, &email, &date, 0))
> 
> ...the new implementation is able to handle only GIT_AUTHOR_NAME,
> GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE, in exactly that order.
> 
> This seems like a pretty serious (and possibly buggy) change of
> behavior, and makes the function much less useful (in general). Is it
> true that it will only ever be used for files containing that limited
> set of names? If so, the behavior change deserves mention in the
> commit message, the function comment needs updating, and the function
> itself probably ought to be renamed.

You're right the change in behavior should be mentioned explicitly, I'd
not thought about it in those terms. I'm not sure if the change is
buggy, this code is what am uses for its author script handling. To me
the point of the author-script file is to set the author details, not to
set arbitrary environment variables. We have already significantly
reduced what someone can do with this file in the transition from shell
to C as we no longer support arbitrary shell code in the file. I'd
rather try and reuse the existing code from am unless someone can
demonstrate an active use for something more general. (I'm still not
sure what use editing the author-script is - it is only of use if the
rebase stops for conflicts, it cannot be used to change the author of an
arbitrary set of commits)

>> +       strbuf_addstr(&script, "GIT_AUTHOR_NAME=");
>> +       strbuf_addstr(&script, name);
>> +       argv_array_push(env, script.buf);
>> +       strbuf_reset(&script);
>> +       strbuf_addstr(&script, "GIT_AUTHOR_EMAIL=");
>> +       strbuf_addstr(&script, email);
>> +       argv_array_push(env, script.buf);
>> +       strbuf_reset(&script);
>> +       strbuf_addstr(&script, "GIT_AUTHOR_DATE=");
>> +       strbuf_addstr(&script, date);
>> +       argv_array_push(env, script.buf);
>> +       strbuf_release(&script);
> 
> Mentioned earlier[1], this can all collapse down to:
> 
> argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
> argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
> argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
> 
> However, it's unfortunate that this manual and hard-coded
> reconstruction is needed at all. If you restructure the factoring of
> this patch series, such ugliness can be avoided altogether. For
> instance, the series could be structured like this:
> 
> 1. Introduce a general-purpose function for reading a file containing
> arbitrary "NAME=Value" lines (not carrying about specific key names or
> their order) and returning them in some data structure (perhaps via
> 'string_list' as parse_key_value_squoted() in patch 2/3 does).
> 
> 2. Build read_author_script() atop #1, making it expect and extract
> GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE (possibly in
> that order, or possibly not if we don't care).
> 
> 3. Retrofit existing parsers to call one of those two functions (this
> step may happen over several patches). So, for instance,
> read_env_script() would call the generic parser from #1, whereas
> sequencer.c:read_author_ident() would call the more specific parser
> from #2.

That plan requires all new code rather than reusing tried and tested
code from am. Furthermore I'm not sure it has any advantage as far as
users are concerned. My aim with this series was to try and do something
fairly simple that reused the parsing from am, rather than build a whole
new system with its own bugs.

> More below...
> 
>> @@ -790,54 +771,25 @@ static char *get_author(const char *message)
>>  /* Read author-script and return an ident line (author <email> timestamp) */
>>  static const char *read_author_ident(struct strbuf *buf)
>>  {
>> -       const char *keys[] = {
>> -               "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
>> -       };
>> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
>> +       if (read_author_script(rebase_path_author_script(),
>> +                              &name, &email, &date, 0))
>>                 return NULL;
>> -       /* dequote values and construct ident line in-place */
>> -       for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
>> -               if (!skip_prefix(in, keys[i], (const char **)&in)) {
>> -                       warning(_("could not parse '%s' (looking for '%s')"),
>> -                               rebase_path_author_script(), keys[i]);
>> -                       return NULL;
>> -               }
>> -               if (!sq_dequote(in)) {
>> -                       warning(_("bad quoting on %s value in '%s'"),
>> -                               keys[i], rebase_path_author_script());
>> -                       return NULL;
>> -               }
>> -       if (i < 3) {
>> -               warning(_("could not parse '%s' (looking for '%s')"),
>> -                       rebase_path_author_script(), keys[i]);
>> -               return NULL;
>> -       }
> 
> The parsing code being thrown away here does a better job of
> diagnosing problems (thus helping the user figure out what went wrong)
> than the new shared parser introduced by patch 2/3. The shared
> function only ever reports a generic "unable to parse", whereas the
> above code gets specific, saying that it was looking for a particular
> key or that quoting was broken. I'd have expected the new shared
> parser to encompass the best features of the existing parsers (such as
> presenting better error messages).

You're right but the context is that this function is only used when the
root commit is rebased (and then only for the root commit). Everything
else goes through read_env_script() which doesn't even bother to check
if all the variables have been set or report any parsing errors.

> 
>>         /* validate date since fmt_ident() will die() on bad value */
>> -       if (parse_date(val[2], &out)){
>> +       if (parse_date(date, buf)){
> 
> Re-purposing the strbuf 'buf', which is passed into this function,
> binds this function too tightly with its caller by assuming that the
> caller will never need the original content of 'buf' anymore. Thus, it
> would be better for this code continue using its own local strbuf
> 'out' rather than re-purposing the incoming 'buf'.

That's a good point, I'll fix it.

>>                 warning(_("invalid date format '%s' in '%s'"),
>> -                       val[2], rebase_path_author_script());
>> -               strbuf_release(&out);
>> +                       date, rebase_path_author_script());
>> +               strbuf_release(buf);
> 
> Likewise, it's doubly odd to see this function releasing 'buf' which
> it does not own.
> 
>>                 return NULL;
>>         }
> 
> [1]: https://public-inbox.org/git/CAPig+cRvUr26GZyW6ecYhpwABueBqaEfZH1+JjLaqZo8+RTD6Q@mail.gmail.com/
> 

Thanks for looking at this, I'm keen to keep things simple and reuse the
am author-script parsing if possible. It is more restrictive but I'm not
sure that anyone is actually taking advantage of the flexibility offered
by the current setup and it fixes the de-quoting bugs in read_env_script().

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/5]  am/rebase: share read_author_script()
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
                   ` (2 preceding siblings ...)
  2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
@ 2018-10-18 10:00 ` Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
                     ` (5 more replies)
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
  5 siblings, 6 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Thanks to Eric for his feedback on v1. I've rerolled based on
that. Patches 1 & 2 are new and try to address some of the concerns
Eric raised, particularly the error handling for a badly edited author
script. See the notes on patches 4 & 5 for the changes to those (they
were previously 2 & 3).

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--------------
 sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/5] am: don't die in read_author_script()
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
@ 2018-10-18 10:00   ` Phillip Wood
  2018-10-25  8:44     ` Junio C Hamano
  2018-10-18 10:00   ` [PATCH v2 2/5] am: improve author-script error reporting Phillip Wood
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
 	if (fd < 0) {
 		if (errno == ENOENT)
 			return 0;
-		die_errno(_("could not open '%s' for reading"), filename);
+		return error_errno(_("could not open '%s' for reading"),
+				   filename);
 	}
 	strbuf_read(&buf, fd, 0);
 	close(fd);
-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 2/5] am: improve author-script error reporting
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
@ 2018-10-18 10:00   ` Phillip Wood
  2018-10-25  8:55     ` Junio C Hamano
  2018-10-18 10:00   ` [PATCH v2 3/5] am: rename read_author_script() Phillip Wood
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 49 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		struct string_list_item *item;
 		char *np;
 		char *cp = strchr(buf, '=');
-		if (!cp)
-			return -1;
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
 		np = strchrnul(cp, '\n');
 		*cp++ = '\0';
 		item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		*np = '\0';
 		cp = sq_dequote(cp);
 		if (!cp)
-			return -1;
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
 		item->util = xstrdup(cp);
 	}
 	return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list kv = STRING_LIST_INIT_DUP;
 	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
 	int fd;
 
 	assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
 	if (parse_key_value_squoted(buf.buf, &kv))
 		goto finish;
 
-	if (kv.nr != 3 ||
-	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i >= 0)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i >= 0)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i >= 0)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
-	state->author_name = kv.items[0].util;
-	state->author_email = kv.items[1].util;
-	state->author_date = kv.items[2].util;
+	state->author_name = kv.items[name_i].util;
+	state->author_email = kv.items[email_i].util;
+	state->author_date = kv.items[date_i].util;
 	retval = 0;
 finish:
 	string_list_clear(&kv, !!retval);
-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 3/5] am: rename read_author_script()
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 2/5] am: improve author-script error reporting Phillip Wood
@ 2018-10-18 10:00   ` Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 4/5] add read_author_script() to libgit Phillip Wood
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
 	struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
 		BUG("state file 'last' does not exist");
 	state->last = strtol(sb.buf, NULL, 10);
 
-	if (read_author_script(state) < 0)
+	if (read_am_author_script(state) < 0)
 		die(_("could not parse author script"));
 
 	read_commit_msg(state);
-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 4/5] add read_author_script() to libgit
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
                     ` (2 preceding siblings ...)
  2018-10-18 10:00   ` [PATCH v2 3/5] am: rename read_author_script() Phillip Wood
@ 2018-10-18 10:00   ` Phillip Wood
  2018-10-18 10:00   ` [PATCH v2 5/5] sequencer: use read_author_script() Phillip Wood
  2018-10-25  8:59   ` [PATCH v2 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v1:
     - added comment above read_author_script()
     - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +----------------------------------------
 sequencer.c  | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state,
 	die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append <KEY, VALUE> at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-	while (*buf) {
-		struct string_list_item *item;
-		char *np;
-		char *cp = strchr(buf, '=');
-		if (!cp) {
-			np = strchrnul(buf, '\n');
-			return error(_("unable to parse '%.*s'"),
-				     (int) (np - buf), buf);
-		}
-		np = strchrnul(cp, '\n');
-		*cp++ = '\0';
-		item = string_list_append(list, buf);
-
-		buf = np + (*np == '\n');
-		*np = '\0';
-		cp = sq_dequote(cp);
-		if (!cp)
-			return error(_("unable to dequote value of '%s'"),
-				     item->string);
-		item->util = xstrdup(cp);
-	}
-	return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
-	struct strbuf buf = STRBUF_INIT;
-	struct string_list kv = STRING_LIST_INIT_DUP;
-	int retval = -1; /* assume failure */
-	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-	int fd;
 
 	assert(!state->author_name);
 	assert(!state->author_email);
 	assert(!state->author_date);
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		return error_errno(_("could not open '%s' for reading"),
-				   filename);
-	}
-	strbuf_read(&buf, fd, 0);
-	close(fd);
-	if (parse_key_value_squoted(buf.buf, &kv))
-		goto finish;
-
-	for (i = 0; i < kv.nr; i++) {
-		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-			if (name_i >= 0)
-				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
-			else
-				name_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-			if (email_i >= 0)
-				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
-			else
-				email_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-			if (date_i >= 0)
-				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
-			else
-				date_i = i;
-		} else {
-			err = error(_("unknown variable '%s'"),
-				    kv.items[i].string);
-		}
-	}
-	if (name_i == -2)
-		error(_("missing 'GIT_AUTHOR_NAME'"));
-	if (email_i == -2)
-		error(_("missing 'GIT_AUTHOR_EMAIL'"));
-	if (date_i == -2)
-		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-		goto finish;
-	state->author_name = kv.items[name_i].util;
-	state->author_email = kv.items[email_i].util;
-	state->author_date = kv.items[date_i].util;
-	retval = 0;
-finish:
-	string_list_clear(&kv, !!retval);
-	strbuf_release(&buf);
-	return retval;
+	return read_author_script(filename, &state->author_name,
+				  &state->author_email, &state->author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..3530dbeb6c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,111 @@ static int write_author_script(const char *message)
 	return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append <KEY, VALUE> at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+	while (*buf) {
+		struct string_list_item *item;
+		char *np;
+		char *cp = strchr(buf, '=');
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
+		np = strchrnul(cp, '\n');
+		*cp++ = '\0';
+		item = string_list_append(list, buf);
+
+		buf = np + (*np == '\n');
+		*np = '\0';
+		cp = sq_dequote(cp);
+		if (!cp)
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
+		item->util = xstrdup(cp);
+	}
+	return 0;
+}
+
+/**
+ * Reads and parses the state directory's "author-script" file, and sets name,
+ * email and date accordingly.
+ * Returns 0 on success, -1 if the file could not be parsed.
+ *
+ * The author script is of the format:
+ *
+ *	GIT_AUTHOR_NAME='$author_name'
+ *	GIT_AUTHOR_EMAIL='$author_email'
+ *	GIT_AUTHOR_DATE='$author_date'
+ *
+ * where $author_name, $author_email and $author_date are quoted. We are strict
+ * with our parsing, as the file was meant to be eval'd in the old
+ * git-am.sh/git-rebase--interactive.sh scripts, and thus if the file differs
+ * from what this function expects, it is better to bail out than to do
+ * something that the user does not expect.
+ */
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list kv = STRING_LIST_INIT_DUP;
+	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
+
+	if (strbuf_read_file(&buf, path, 256) <= 0) {
+		strbuf_release(&buf);
+		if (errno == ENOENT && allow_missing)
+			return 0;
+		else
+			return error_errno(_("could not open '%s' for reading"),
+					   path);
+	}
+
+	if (parse_key_value_squoted(buf.buf, &kv))
+		goto finish;
+
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i >= 0)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i >= 0)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i >= 0)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+		goto finish;
+	*name = kv.items[name_i].util;
+	*email = kv.items[email_i].util;
+	*date = kv.items[date_i].util;
+	retval = 0;
+finish:
+	string_list_clear(&kv, !!retval);
+	strbuf_release(&buf);
+	return retval;
+}
 
 /*
  * write_author_script() used to fail to terminate the last line with a "'" and
diff --git a/sequencer.h b/sequencer.h
index c751c9d6e4..3713f955f5 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -107,4 +107,7 @@ void commit_post_rewrite(const struct commit *current_head,
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(const char *prefix, const struct object_id *oid,
 			  unsigned int flags);
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing);
 #endif
-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 5/5] sequencer: use read_author_script()
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
                     ` (3 preceding siblings ...)
  2018-10-18 10:00   ` [PATCH v2 4/5] add read_author_script() to libgit Phillip Wood
@ 2018-10-18 10:00   ` Phillip Wood
  2018-10-25  8:59   ` [PATCH v2 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-18 10:00 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v1
     - use argv_array_pushf() as suggested by Eric
     - fixed strbuf handling as suggested by Eric
     - fix comments and commit message to reflect changed behavior of
       read_env_script()

 sequencer.c | 97 ++++++++++++-----------------------------------------
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3530dbeb6c..987542f67c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-	/* Skip any empty lines in case the file was hand edited */
-	while (n > 0 && s[--n] == '\n')
-		; /* empty */
-	if (n > 0 && s[n] != '\'')
-		return 1;
-
-	return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-	struct strbuf script = STRBUF_INIT;
-	int i, count = 0, sq_bug;
-	const char *p2;
-	char *p;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return -1;
-	/* write_author_script() used to quote incorrectly */
-	sq_bug = quoting_is_broken(script.buf, script.len);
-	for (p = script.buf; *p; p++)
-		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (skip_prefix(p, "'\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (*p == '\'')
-			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
-		else if (*p == '\n') {
-			*p = '\0';
-			count++;
-		}
 
-	for (i = 0, p = script.buf; i < count; i++) {
-		argv_array_push(env, p);
-		p += strlen(p) + 1;
-	}
+	argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+	argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+	argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+	free(name);
+	free(email);
+	free(date);
 
 	return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author <email> timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-	const char *keys[] = {
-		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-	};
 	struct strbuf out = STRBUF_INIT;
-	char *in, *eol;
-	const char *val[3];
-	int i = 0;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return NULL;
 
-	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
-		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning(_("could not parse '%s' (looking for '%s')"),
-				rebase_path_author_script(), keys[i]);
-			return NULL;
-		}
-
-		eol = strchrnul(in, '\n');
-		*eol = '\0';
-		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
-		}
-		val[i] = in;
-		in = eol + 1;
-	}
-
-	if (i < 3) {
-		warning(_("could not parse '%s' (looking for '%s')"),
-			rebase_path_author_script(), keys[i]);
-		return NULL;
-	}
-
 	/* validate date since fmt_ident() will die() on bad value */
-	if (parse_date(val[2], &out)){
+	if (parse_date(date, &out)){
 		warning(_("invalid date format '%s' in '%s'"),
-			val[2], rebase_path_author_script());
+			date, rebase_path_author_script());
 		strbuf_release(&out);
 		return NULL;
 	}
 
 	strbuf_reset(&out);
-	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
+	strbuf_addstr(&out, fmt_ident(name, email, date, 0));
 	strbuf_swap(buf, &out);
 	strbuf_release(&out);
+	free(name);
+	free(email);
+	free(date);
 	return buf->buf;
 }
 
-- 
2.19.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/5] am: don't die in read_author_script()
  2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
@ 2018-10-25  8:44     ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-10-25  8:44 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> The caller is already prepared to handle errors returned from this
> function so there is no need for it to die if it cannot read the file.
>
> Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  builtin/am.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

OK.  Having fewer die's in reusable codepath can only be a good
thing.

>
> diff --git a/builtin/am.c b/builtin/am.c
> index 5e866d17c7..b68578bc3f 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
>  	if (fd < 0) {
>  		if (errno == ENOENT)
>  			return 0;
> -		die_errno(_("could not open '%s' for reading"), filename);
> +		return error_errno(_("could not open '%s' for reading"),
> +				   filename);
>  	}
>  	strbuf_read(&buf, fd, 0);
>  	close(fd);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/5] am: improve author-script error reporting
  2018-10-18 10:00   ` [PATCH v2 2/5] am: improve author-script error reporting Phillip Wood
@ 2018-10-25  8:55     ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-10-25  8:55 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> If there are errors in a user edited author-script there was no
> indication of what was wrong. This commit adds some specific error messages
> depending on the problem. It also relaxes the requirement that the
> variables appear in a specific order in the file to match the behavior
> of 'rebase --interactive'.

That relaxing is sensible; there is no reason to insist that the
file we are reading was written by exactly the same writer as we
have.

> diff --git a/builtin/am.c b/builtin/am.c
> index b68578bc3f..d42b725273 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
>  		struct string_list_item *item;
>  		char *np;
>  		char *cp = strchr(buf, '=');
> -		if (!cp)
> -			return -1;
> +		if (!cp) {
> +			np = strchrnul(buf, '\n');
> +			return error(_("unable to parse '%.*s'"),
> +				     (int) (np - buf), buf);
> +		}

We are unable to parse because it is not of KEY='VALUE' form.  Is
that something worth reporting, e.g. "no key present in '%.*s'"?

> @@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
>  		*np = '\0';
>  		cp = sq_dequote(cp);
>  		if (!cp)
> -			return -1;
> +			return error(_("unable to dequote value of '%s'"),
> +				     item->string);

At this point, item->string is what we earlier found on the left
hand side of the '=', i.e. the key.  The message makes sense.

> @@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
>  	struct strbuf buf = STRBUF_INIT;
>  	struct string_list kv = STRING_LIST_INIT_DUP;
>  	int retval = -1; /* assume failure */
> +	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;

That -2 is somewhat cute.  If we find a dup, then it will become -1
so later check for missing field would not trigger.

> @@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
>  	if (parse_key_value_squoted(buf.buf, &kv))
>  		goto finish;
>  
> -	if (kv.nr != 3 ||
> -	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
> -	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
> -	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
> +	for (i = 0; i < kv.nr; i++) {
> +		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
> +			if (name_i >= 0)
> +				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
> +			else
> +				name_i = i;

However, if you have three instances of GIT_AUTHOR_NAME, then

 - the first one makes name_i point at it
 - the second one triggers an error and name_i becomes -1
 - the third one makes name_i point at it again

And name_i is not -2 so we won't give "missing" error, which is OK,
but we end up having a usable name even though we said we detected
duplicate!

You can probably compare name_i with -2 when detecting the dup to
fix it, i.e.

	if (name_i != -2)
		name_i = error("found dup");
	else
		name_i = i;

> +		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
> +			if (email_i >= 0)
> +				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
> +			else
> +				email_i = i;
> +		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
> +			if (date_i >= 0)
> +				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
> +			else
> +				date_i = i;
> +		} else {
> +			err = error(_("unknown variable '%s'"),
> +				    kv.items[i].string);
> +		}
> +	}
> +	if (name_i == -2)
> +		error(_("missing 'GIT_AUTHOR_NAME'"));
> +	if (email_i == -2)
> +		error(_("missing 'GIT_AUTHOR_EMAIL'"));
> +	if (date_i == -2)
> +		error(_("missing 'GIT_AUTHOR_DATE'"));
> +	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
>  		goto finish;
> -	state->author_name = kv.items[0].util;
> -	state->author_email = kv.items[1].util;
> -	state->author_date = kv.items[2].util;
> +	state->author_name = kv.items[name_i].util;
> +	state->author_email = kv.items[email_i].util;
> +	state->author_date = kv.items[date_i].util;
>  	retval = 0;
>  finish:
>  	string_list_clear(&kv, !!retval);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/5]  am/rebase: share read_author_script()
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
                     ` (4 preceding siblings ...)
  2018-10-18 10:00   ` [PATCH v2 5/5] sequencer: use read_author_script() Phillip Wood
@ 2018-10-25  8:59   ` Junio C Hamano
  2018-10-26  9:00     ` Phillip Wood
  5 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2018-10-25  8:59 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Thanks to Eric for his feedback on v1. I've rerolled based on
> that. Patches 1 & 2 are new and try to address some of the concerns
> Eric raised, particularly the error handling for a badly edited author
> script. See the notes on patches 4 & 5 for the changes to those (they
> were previously 2 & 3).

I spotted a weird corner case buglet, but it seems that this one is
ready for 'next' even without fixing that "give it three times and
we will happily continue" thing.

Do we know of any other issues?  Can we now move it forward?

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/5] am/rebase: share read_author_script()
  2018-10-25  8:59   ` [PATCH v2 0/5] am/rebase: share read_author_script() Junio C Hamano
@ 2018-10-26  9:00     ` Phillip Wood
  2018-10-26 13:32       ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-10-26  9:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Hi Junio

On 25/10/2018 09:59, Junio C Hamano wrote:
> Phillip Wood <phillip.wood@talktalk.net> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Thanks to Eric for his feedback on v1. I've rerolled based on
>> that. Patches 1 & 2 are new and try to address some of the concerns
>> Eric raised, particularly the error handling for a badly edited author
>> script. See the notes on patches 4 & 5 for the changes to those (they
>> were previously 2 & 3).
> 
> I spotted a weird corner case buglet, but it seems that this one is
> ready for 'next' even without fixing that "give it three times and
> we will happily continue" thing.

Well spotted on the corner case. If you're happy to hold off on moving
it to next I can send a re-roll with fixes for that next week or I can
do it as a fixup if you want to move this forward now.

> Do we know of any other issues?  Can we now move it forward?

I don't know of any other issues but I don't think anyone else has
looked at this iteration. Thanks for taking a look at these.

Best Wishes

Phillip



> Thanks.
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/5] am/rebase: share read_author_script()
  2018-10-26  9:00     ` Phillip Wood
@ 2018-10-26 13:32       ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-10-26 13:32 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

>> I spotted a weird corner case buglet, but it seems that this one is
>> ready for 'next' even without fixing that "give it three times and
>> we will happily continue" thing.
>
> Well spotted on the corner case. If you're happy to hold off on moving
> it to next I can send a re-roll with fixes for that next week or I can
> do it as a fixup if you want to move this forward now.

OK, I'll mark it as "expecting a reroll".


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/5] am/rebase: share read_author_script()
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
                   ` (3 preceding siblings ...)
  2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
@ 2018-10-30 10:39 ` " Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 1/5] am: don't die in read_author_script() Phillip Wood
                     ` (5 more replies)
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
  5 siblings, 6 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--------------
 sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/5] am: don't die in read_author_script()
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
@ 2018-10-30 10:39   ` Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 2/5] am: improve author-script error reporting Phillip Wood
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
 	if (fd < 0) {
 		if (errno == ENOENT)
 			return 0;
-		die_errno(_("could not open '%s' for reading"), filename);
+		return error_errno(_("could not open '%s' for reading"),
+				   filename);
 	}
 	strbuf_read(&buf, fd, 0);
 	close(fd);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 2/5] am: improve author-script error reporting
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 1/5] am: don't die in read_author_script() Phillip Wood
@ 2018-10-30 10:39   ` Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 3/5] am: rename read_author_script() Phillip Wood
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 49 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		struct string_list_item *item;
 		char *np;
 		char *cp = strchr(buf, '=');
-		if (!cp)
-			return -1;
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
 		np = strchrnul(cp, '\n');
 		*cp++ = '\0';
 		item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		*np = '\0';
 		cp = sq_dequote(cp);
 		if (!cp)
-			return -1;
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
 		item->util = xstrdup(cp);
 	}
 	return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list kv = STRING_LIST_INIT_DUP;
 	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
 	int fd;
 
 	assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
 	if (parse_key_value_squoted(buf.buf, &kv))
 		goto finish;
 
-	if (kv.nr != 3 ||
-	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i >= 0)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i >= 0)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i >= 0)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
-	state->author_name = kv.items[0].util;
-	state->author_email = kv.items[1].util;
-	state->author_date = kv.items[2].util;
+	state->author_name = kv.items[name_i].util;
+	state->author_email = kv.items[email_i].util;
+	state->author_date = kv.items[date_i].util;
 	retval = 0;
 finish:
 	string_list_clear(&kv, !!retval);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 3/5] am: rename read_author_script()
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 1/5] am: don't die in read_author_script() Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 2/5] am: improve author-script error reporting Phillip Wood
@ 2018-10-30 10:39   ` Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 4/5] add read_author_script() to libgit Phillip Wood
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
 	struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
 		BUG("state file 'last' does not exist");
 	state->last = strtol(sb.buf, NULL, 10);
 
-	if (read_author_script(state) < 0)
+	if (read_am_author_script(state) < 0)
 		die(_("could not parse author script"));
 
 	read_commit_msg(state);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 4/5] add read_author_script() to libgit
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
                     ` (2 preceding siblings ...)
  2018-10-30 10:39   ` [PATCH v3 3/5] am: rename read_author_script() Phillip Wood
@ 2018-10-30 10:39   ` Phillip Wood
  2018-10-30 10:39   ` [PATCH v3 5/5] sequencer: use read_author_script() Phillip Wood
  2018-10-31  2:50   ` [PATCH v3 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v1:
     - added comment above read_author_script()
     - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +----------------------------------------
 sequencer.c  | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state,
 	die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append <KEY, VALUE> at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-	while (*buf) {
-		struct string_list_item *item;
-		char *np;
-		char *cp = strchr(buf, '=');
-		if (!cp) {
-			np = strchrnul(buf, '\n');
-			return error(_("unable to parse '%.*s'"),
-				     (int) (np - buf), buf);
-		}
-		np = strchrnul(cp, '\n');
-		*cp++ = '\0';
-		item = string_list_append(list, buf);
-
-		buf = np + (*np == '\n');
-		*np = '\0';
-		cp = sq_dequote(cp);
-		if (!cp)
-			return error(_("unable to dequote value of '%s'"),
-				     item->string);
-		item->util = xstrdup(cp);
-	}
-	return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
-	struct strbuf buf = STRBUF_INIT;
-	struct string_list kv = STRING_LIST_INIT_DUP;
-	int retval = -1; /* assume failure */
-	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-	int fd;
 
 	assert(!state->author_name);
 	assert(!state->author_email);
 	assert(!state->author_date);
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		return error_errno(_("could not open '%s' for reading"),
-				   filename);
-	}
-	strbuf_read(&buf, fd, 0);
-	close(fd);
-	if (parse_key_value_squoted(buf.buf, &kv))
-		goto finish;
-
-	for (i = 0; i < kv.nr; i++) {
-		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-			if (name_i >= 0)
-				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
-			else
-				name_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-			if (email_i >= 0)
-				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
-			else
-				email_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-			if (date_i >= 0)
-				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
-			else
-				date_i = i;
-		} else {
-			err = error(_("unknown variable '%s'"),
-				    kv.items[i].string);
-		}
-	}
-	if (name_i == -2)
-		error(_("missing 'GIT_AUTHOR_NAME'"));
-	if (email_i == -2)
-		error(_("missing 'GIT_AUTHOR_EMAIL'"));
-	if (date_i == -2)
-		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-		goto finish;
-	state->author_name = kv.items[name_i].util;
-	state->author_email = kv.items[email_i].util;
-	state->author_date = kv.items[date_i].util;
-	retval = 0;
-finish:
-	string_list_clear(&kv, !!retval);
-	strbuf_release(&buf);
-	return retval;
+	return read_author_script(filename, &state->author_name,
+				  &state->author_email, &state->author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..3530dbeb6c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,111 @@ static int write_author_script(const char *message)
 	return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append <KEY, VALUE> at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+	while (*buf) {
+		struct string_list_item *item;
+		char *np;
+		char *cp = strchr(buf, '=');
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
+		np = strchrnul(cp, '\n');
+		*cp++ = '\0';
+		item = string_list_append(list, buf);
+
+		buf = np + (*np == '\n');
+		*np = '\0';
+		cp = sq_dequote(cp);
+		if (!cp)
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
+		item->util = xstrdup(cp);
+	}
+	return 0;
+}
+
+/**
+ * Reads and parses the state directory's "author-script" file, and sets name,
+ * email and date accordingly.
+ * Returns 0 on success, -1 if the file could not be parsed.
+ *
+ * The author script is of the format:
+ *
+ *	GIT_AUTHOR_NAME='$author_name'
+ *	GIT_AUTHOR_EMAIL='$author_email'
+ *	GIT_AUTHOR_DATE='$author_date'
+ *
+ * where $author_name, $author_email and $author_date are quoted. We are strict
+ * with our parsing, as the file was meant to be eval'd in the old
+ * git-am.sh/git-rebase--interactive.sh scripts, and thus if the file differs
+ * from what this function expects, it is better to bail out than to do
+ * something that the user does not expect.
+ */
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list kv = STRING_LIST_INIT_DUP;
+	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
+
+	if (strbuf_read_file(&buf, path, 256) <= 0) {
+		strbuf_release(&buf);
+		if (errno == ENOENT && allow_missing)
+			return 0;
+		else
+			return error_errno(_("could not open '%s' for reading"),
+					   path);
+	}
+
+	if (parse_key_value_squoted(buf.buf, &kv))
+		goto finish;
+
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i >= 0)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i >= 0)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i >= 0)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+		goto finish;
+	*name = kv.items[name_i].util;
+	*email = kv.items[email_i].util;
+	*date = kv.items[date_i].util;
+	retval = 0;
+finish:
+	string_list_clear(&kv, !!retval);
+	strbuf_release(&buf);
+	return retval;
+}
 
 /*
  * write_author_script() used to fail to terminate the last line with a "'" and
diff --git a/sequencer.h b/sequencer.h
index c751c9d6e4..3713f955f5 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -107,4 +107,7 @@ void commit_post_rewrite(const struct commit *current_head,
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(const char *prefix, const struct object_id *oid,
 			  unsigned int flags);
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing);
 #endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 5/5] sequencer: use read_author_script()
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
                     ` (3 preceding siblings ...)
  2018-10-30 10:39   ` [PATCH v3 4/5] add read_author_script() to libgit Phillip Wood
@ 2018-10-30 10:39   ` Phillip Wood
  2018-10-31  2:50   ` [PATCH v3 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-30 10:39 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v1
     - use argv_array_pushf() as suggested by Eric
     - fixed strbuf handling as suggested by Eric
     - fix comments and commit message to reflect changed behavior of
       read_env_script()

 sequencer.c | 97 ++++++++++++-----------------------------------------
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3530dbeb6c..987542f67c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-	/* Skip any empty lines in case the file was hand edited */
-	while (n > 0 && s[--n] == '\n')
-		; /* empty */
-	if (n > 0 && s[n] != '\'')
-		return 1;
-
-	return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-	struct strbuf script = STRBUF_INIT;
-	int i, count = 0, sq_bug;
-	const char *p2;
-	char *p;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return -1;
-	/* write_author_script() used to quote incorrectly */
-	sq_bug = quoting_is_broken(script.buf, script.len);
-	for (p = script.buf; *p; p++)
-		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (skip_prefix(p, "'\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (*p == '\'')
-			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
-		else if (*p == '\n') {
-			*p = '\0';
-			count++;
-		}
 
-	for (i = 0, p = script.buf; i < count; i++) {
-		argv_array_push(env, p);
-		p += strlen(p) + 1;
-	}
+	argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+	argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+	argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+	free(name);
+	free(email);
+	free(date);
 
 	return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author <email> timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-	const char *keys[] = {
-		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-	};
 	struct strbuf out = STRBUF_INIT;
-	char *in, *eol;
-	const char *val[3];
-	int i = 0;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return NULL;
 
-	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
-		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning(_("could not parse '%s' (looking for '%s')"),
-				rebase_path_author_script(), keys[i]);
-			return NULL;
-		}
-
-		eol = strchrnul(in, '\n');
-		*eol = '\0';
-		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
-		}
-		val[i] = in;
-		in = eol + 1;
-	}
-
-	if (i < 3) {
-		warning(_("could not parse '%s' (looking for '%s')"),
-			rebase_path_author_script(), keys[i]);
-		return NULL;
-	}
-
 	/* validate date since fmt_ident() will die() on bad value */
-	if (parse_date(val[2], &out)){
+	if (parse_date(date, &out)){
 		warning(_("invalid date format '%s' in '%s'"),
-			val[2], rebase_path_author_script());
+			date, rebase_path_author_script());
 		strbuf_release(&out);
 		return NULL;
 	}
 
 	strbuf_reset(&out);
-	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
+	strbuf_addstr(&out, fmt_ident(name, email, date, 0));
 	strbuf_swap(buf, &out);
 	strbuf_release(&out);
+	free(name);
+	free(email);
+	free(date);
 	return buf->buf;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/5] am/rebase: share read_author_script()
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
                     ` (4 preceding siblings ...)
  2018-10-30 10:39   ` [PATCH v3 5/5] sequencer: use read_author_script() Phillip Wood
@ 2018-10-31  2:50   ` Junio C Hamano
  2018-10-31 10:14     ` Phillip Wood
  5 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2018-10-31  2:50 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
> those comments, the rest are unchanged.

Hmph, all these five patches seem to be identical to what I have in
'pu'.  Did you send the right version?

> v1 cover letter:
>
> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
> duplication and improves rebase's parsing of the author script. After
> this I'll do another series to share the code to write the author
> script.
>
> Phillip Wood (5):
>   am: don't die in read_author_script()
>   am: improve author-script error reporting
>   am: rename read_author_script()
>   add read_author_script() to libgit
>   sequencer: use read_author_script()
>
>  builtin/am.c |  60 ++--------------
>  sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
>  sequencer.h  |   3 +
>  3 files changed, 128 insertions(+), 127 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/5] am/rebase: share read_author_script()
  2018-10-31  2:50   ` [PATCH v3 0/5] am/rebase: share read_author_script() Junio C Hamano
@ 2018-10-31 10:14     ` Phillip Wood
  0 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

On 31/10/2018 02:50, Junio C Hamano wrote:
> Phillip Wood <phillip.wood@talktalk.net> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
>> those comments, the rest are unchanged.
> 
> Hmph, all these five patches seem to be identical to what I have in
> 'pu'.  Did you send the right version?

Oh dear that's embarrassing. I updated the patches on my laptop and
forget to sync before sending them from my desktop. I'll send v4 now.

Sorry for the confusion

Phillip

> 
>> v1 cover letter:
>>
>> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
>> duplication and improves rebase's parsing of the author script. After
>> this I'll do another series to share the code to write the author
>> script.
>>
>> Phillip Wood (5):
>>   am: don't die in read_author_script()
>>   am: improve author-script error reporting
>>   am: rename read_author_script()
>>   add read_author_script() to libgit
>>   sequencer: use read_author_script()
>>
>>  builtin/am.c |  60 ++--------------
>>  sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
>>  sequencer.h  |   3 +
>>  3 files changed, 128 insertions(+), 127 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 0/5] am/rebase: share read_author_script()
  2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
                   ` (4 preceding siblings ...)
  2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
@ 2018-10-31 10:15 ` " Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 1/5] am: don't die in read_author_script() Phillip Wood
                     ` (5 more replies)
  5 siblings, 6 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Sorry for the confusion with v3, here are the updated patches.

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.


Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--------------
 sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 1/5] am: don't die in read_author_script()
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
@ 2018-10-31 10:15   ` Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 2/5] am: improve author-script error reporting Phillip Wood
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
 	if (fd < 0) {
 		if (errno == ENOENT)
 			return 0;
-		die_errno(_("could not open '%s' for reading"), filename);
+		return error_errno(_("could not open '%s' for reading"),
+				   filename);
 	}
 	strbuf_read(&buf, fd, 0);
 	close(fd);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 2/5] am: improve author-script error reporting
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 1/5] am: don't die in read_author_script() Phillip Wood
@ 2018-10-31 10:15   ` Phillip Wood
  2018-11-04  4:12     ` Eric Sunshine
  2018-10-31 10:15   ` [PATCH v4 3/5] am: rename read_author_script() Phillip Wood
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 49 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		struct string_list_item *item;
 		char *np;
 		char *cp = strchr(buf, '=');
-		if (!cp)
-			return -1;
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
 		np = strchrnul(cp, '\n');
 		*cp++ = '\0';
 		item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		*np = '\0';
 		cp = sq_dequote(cp);
 		if (!cp)
-			return -1;
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
 		item->util = xstrdup(cp);
 	}
 	return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list kv = STRING_LIST_INIT_DUP;
 	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
 	int fd;
 
 	assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
 	if (parse_key_value_squoted(buf.buf, &kv))
 		goto finish;
 
-	if (kv.nr != 3 ||
-	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i >= 0)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i >= 0)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i >= 0)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
-	state->author_name = kv.items[0].util;
-	state->author_email = kv.items[1].util;
-	state->author_date = kv.items[2].util;
+	state->author_name = kv.items[name_i].util;
+	state->author_email = kv.items[email_i].util;
+	state->author_date = kv.items[date_i].util;
 	retval = 0;
 finish:
 	string_list_clear(&kv, !!retval);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 3/5] am: rename read_author_script()
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 1/5] am: don't die in read_author_script() Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 2/5] am: improve author-script error reporting Phillip Wood
@ 2018-10-31 10:15   ` Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 4/5] add read_author_script() to libgit Phillip Wood
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
 	struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
 		BUG("state file 'last' does not exist");
 	state->last = strtol(sb.buf, NULL, 10);
 
-	if (read_author_script(state) < 0)
+	if (read_am_author_script(state) < 0)
 		die(_("could not parse author script"));
 
 	read_commit_msg(state);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 4/5] add read_author_script() to libgit
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
                     ` (2 preceding siblings ...)
  2018-10-31 10:15   ` [PATCH v4 3/5] am: rename read_author_script() Phillip Wood
@ 2018-10-31 10:15   ` Phillip Wood
  2018-10-31 10:15   ` [PATCH v4 5/5] sequencer: use read_author_script() Phillip Wood
  2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v2:
     - tweaked an error message as suggested by Junio
     - fixed corner case where a key is given three times (Thanks to Junio
       for pointing this out)
    
    changes since v1:
     - added comment above read_author_script()
     - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +----------------------------------------
 sequencer.c  | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state,
 	die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append <KEY, VALUE> at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-	while (*buf) {
-		struct string_list_item *item;
-		char *np;
-		char *cp = strchr(buf, '=');
-		if (!cp) {
-			np = strchrnul(buf, '\n');
-			return error(_("unable to parse '%.*s'"),
-				     (int) (np - buf), buf);
-		}
-		np = strchrnul(cp, '\n');
-		*cp++ = '\0';
-		item = string_list_append(list, buf);
-
-		buf = np + (*np == '\n');
-		*np = '\0';
-		cp = sq_dequote(cp);
-		if (!cp)
-			return error(_("unable to dequote value of '%s'"),
-				     item->string);
-		item->util = xstrdup(cp);
-	}
-	return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
-	struct strbuf buf = STRBUF_INIT;
-	struct string_list kv = STRING_LIST_INIT_DUP;
-	int retval = -1; /* assume failure */
-	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-	int fd;
 
 	assert(!state->author_name);
 	assert(!state->author_email);
 	assert(!state->author_date);
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		return error_errno(_("could not open '%s' for reading"),
-				   filename);
-	}
-	strbuf_read(&buf, fd, 0);
-	close(fd);
-	if (parse_key_value_squoted(buf.buf, &kv))
-		goto finish;
-
-	for (i = 0; i < kv.nr; i++) {
-		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-			if (name_i >= 0)
-				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
-			else
-				name_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-			if (email_i >= 0)
-				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
-			else
-				email_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-			if (date_i >= 0)
-				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
-			else
-				date_i = i;
-		} else {
-			err = error(_("unknown variable '%s'"),
-				    kv.items[i].string);
-		}
-	}
-	if (name_i == -2)
-		error(_("missing 'GIT_AUTHOR_NAME'"));
-	if (email_i == -2)
-		error(_("missing 'GIT_AUTHOR_EMAIL'"));
-	if (date_i == -2)
-		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-		goto finish;
-	state->author_name = kv.items[name_i].util;
-	state->author_email = kv.items[email_i].util;
-	state->author_date = kv.items[date_i].util;
-	retval = 0;
-finish:
-	string_list_clear(&kv, !!retval);
-	strbuf_release(&buf);
-	return retval;
+	return read_author_script(filename, &state->author_name,
+				  &state->author_email, &state->author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..af9987c807 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,111 @@ static int write_author_script(const char *message)
 	return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append <KEY, VALUE> at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+	while (*buf) {
+		struct string_list_item *item;
+		char *np;
+		char *cp = strchr(buf, '=');
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("no key present in '%.*s'"),
+				     (int) (np - buf), buf);
+		}
+		np = strchrnul(cp, '\n');
+		*cp++ = '\0';
+		item = string_list_append(list, buf);
+
+		buf = np + (*np == '\n');
+		*np = '\0';
+		cp = sq_dequote(cp);
+		if (!cp)
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
+		item->util = xstrdup(cp);
+	}
+	return 0;
+}
+
+/**
+ * Reads and parses the state directory's "author-script" file, and sets name,
+ * email and date accordingly.
+ * Returns 0 on success, -1 if the file could not be parsed.
+ *
+ * The author script is of the format:
+ *
+ *	GIT_AUTHOR_NAME='$author_name'
+ *	GIT_AUTHOR_EMAIL='$author_email'
+ *	GIT_AUTHOR_DATE='$author_date'
+ *
+ * where $author_name, $author_email and $author_date are quoted. We are strict
+ * with our parsing, as the file was meant to be eval'd in the old
+ * git-am.sh/git-rebase--interactive.sh scripts, and thus if the file differs
+ * from what this function expects, it is better to bail out than to do
+ * something that the user does not expect.
+ */
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list kv = STRING_LIST_INIT_DUP;
+	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
+
+	if (strbuf_read_file(&buf, path, 256) <= 0) {
+		strbuf_release(&buf);
+		if (errno == ENOENT && allow_missing)
+			return 0;
+		else
+			return error_errno(_("could not open '%s' for reading"),
+					   path);
+	}
+
+	if (parse_key_value_squoted(buf.buf, &kv))
+		goto finish;
+
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i == -2)
+				name_i = i;
+			else
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i == -2)
+				email_i = i;
+			else
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i == -2)
+				date_i = i;
+			else
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+		goto finish;
+	*name = kv.items[name_i].util;
+	*email = kv.items[email_i].util;
+	*date = kv.items[date_i].util;
+	retval = 0;
+finish:
+	string_list_clear(&kv, !!retval);
+	strbuf_release(&buf);
+	return retval;
+}
 
 /*
  * write_author_script() used to fail to terminate the last line with a "'" and
diff --git a/sequencer.h b/sequencer.h
index c751c9d6e4..3713f955f5 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -107,4 +107,7 @@ void commit_post_rewrite(const struct commit *current_head,
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(const char *prefix, const struct object_id *oid,
 			  unsigned int flags);
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing);
 #endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 5/5] sequencer: use read_author_script()
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
                     ` (3 preceding siblings ...)
  2018-10-31 10:15   ` [PATCH v4 4/5] add read_author_script() to libgit Phillip Wood
@ 2018-10-31 10:15   ` Phillip Wood
  2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-10-31 10:15 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v1
     - use argv_array_pushf() as suggested by Eric
     - fixed strbuf handling as suggested by Eric
     - fix comments and commit message to reflect changed behavior of
       read_env_script()

 sequencer.c | 97 ++++++++++++-----------------------------------------
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index af9987c807..09dc200b4f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-	/* Skip any empty lines in case the file was hand edited */
-	while (n > 0 && s[--n] == '\n')
-		; /* empty */
-	if (n > 0 && s[n] != '\'')
-		return 1;
-
-	return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-	struct strbuf script = STRBUF_INIT;
-	int i, count = 0, sq_bug;
-	const char *p2;
-	char *p;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return -1;
-	/* write_author_script() used to quote incorrectly */
-	sq_bug = quoting_is_broken(script.buf, script.len);
-	for (p = script.buf; *p; p++)
-		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (skip_prefix(p, "'\\''", &p2))
-			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
-		else if (*p == '\'')
-			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
-		else if (*p == '\n') {
-			*p = '\0';
-			count++;
-		}
 
-	for (i = 0, p = script.buf; i < count; i++) {
-		argv_array_push(env, p);
-		p += strlen(p) + 1;
-	}
+	argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+	argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+	argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+	free(name);
+	free(email);
+	free(date);
 
 	return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author <email> timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-	const char *keys[] = {
-		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-	};
 	struct strbuf out = STRBUF_INIT;
-	char *in, *eol;
-	const char *val[3];
-	int i = 0;
+	char *name, *email, *date;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+	if (read_author_script(rebase_path_author_script(),
+			       &name, &email, &date, 0))
 		return NULL;
 
-	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
-		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning(_("could not parse '%s' (looking for '%s')"),
-				rebase_path_author_script(), keys[i]);
-			return NULL;
-		}
-
-		eol = strchrnul(in, '\n');
-		*eol = '\0';
-		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
-		}
-		val[i] = in;
-		in = eol + 1;
-	}
-
-	if (i < 3) {
-		warning(_("could not parse '%s' (looking for '%s')"),
-			rebase_path_author_script(), keys[i]);
-		return NULL;
-	}
-
 	/* validate date since fmt_ident() will die() on bad value */
-	if (parse_date(val[2], &out)){
+	if (parse_date(date, &out)){
 		warning(_("invalid date format '%s' in '%s'"),
-			val[2], rebase_path_author_script());
+			date, rebase_path_author_script());
 		strbuf_release(&out);
 		return NULL;
 	}
 
 	strbuf_reset(&out);
-	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
+	strbuf_addstr(&out, fmt_ident(name, email, date, 0));
 	strbuf_swap(buf, &out);
 	strbuf_release(&out);
+	free(name);
+	free(email);
+	free(date);
 	return buf->buf;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 0/5] am/rebase: share read_author_script()
  2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
                     ` (4 preceding siblings ...)
  2018-10-31 10:15   ` [PATCH v4 5/5] sequencer: use read_author_script() Phillip Wood
@ 2018-11-01  3:01   ` Junio C Hamano
  2018-11-01  3:12     ` [PATCH v4fixed 2/5] am: improve author-script error reporting Junio C Hamano
  2018-11-01 16:27     ` [PATCH v4 0/5] am/rebase: share read_author_script() Phillip Wood
  5 siblings, 2 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-11-01  3:01 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Sorry for the confusion with v3, here are the updated patches.
>
> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
> those comments, the rest are unchanged.

The mistake of overwriting -1 (i.e. earlier we detected dup) with
the third instance of the same originates at [2/5], so updating
[4/5] without fixing it at its source would mean [4/5] is not a pure
code movement to make it available to libgit users---it instead hides
a (not so important) bugfix in it.

>
> v1 cover letter:
>
> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
> duplication and improves rebase's parsing of the author script. After
> this I'll do another series to share the code to write the author
> script.
>
>
> Phillip Wood (5):
>   am: don't die in read_author_script()
>   am: improve author-script error reporting
>   am: rename read_author_script()
>   add read_author_script() to libgit
>   sequencer: use read_author_script()
>
>  builtin/am.c |  60 ++--------------
>  sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
>  sequencer.h  |   3 +
>  3 files changed, 128 insertions(+), 127 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4fixed 2/5] am: improve author-script error reporting
  2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
@ 2018-11-01  3:12     ` Junio C Hamano
  2018-11-01  3:12       ` [PATCH v4fixed 4/5] add read_author_script() to libgit Junio C Hamano
  2018-11-01 16:27     ` [PATCH v4 0/5] am/rebase: share read_author_script() Phillip Wood
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2018-11-01  3:12 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/am.c | 49 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 4f7f28a9dc..ffca4479d7 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		struct string_list_item *item;
 		char *np;
 		char *cp = strchr(buf, '=');
-		if (!cp)
-			return -1;
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("unable to parse '%.*s'"),
+				     (int) (np - buf), buf);
+		}
 		np = strchrnul(cp, '\n');
 		*cp++ = '\0';
 		item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 		*np = '\0';
 		cp = sq_dequote(cp);
 		if (!cp)
-			return -1;
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
 		item->util = xstrdup(cp);
 	}
 	return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list kv = STRING_LIST_INIT_DUP;
 	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
 	int fd;
 
 	assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
 	if (parse_key_value_squoted(buf.buf, &kv))
 		goto finish;
 
-	if (kv.nr != 3 ||
-	    strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-	    strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-	    strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i != -2)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i != -2)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i != -2)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
-	state->author_name = kv.items[0].util;
-	state->author_email = kv.items[1].util;
-	state->author_date = kv.items[2].util;
+	state->author_name = kv.items[name_i].util;
+	state->author_email = kv.items[email_i].util;
+	state->author_date = kv.items[date_i].util;
 	retval = 0;
 finish:
 	string_list_clear(&kv, !!retval);
-- 
2.19.1-708-g4ede3d42df


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4fixed 4/5] add read_author_script() to libgit
  2018-11-01  3:12     ` [PATCH v4fixed 2/5] am: improve author-script error reporting Junio C Hamano
@ 2018-11-01  3:12       ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-11-01  3:12 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/am.c |  86 +----------------------------------------
 sequencer.c  | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c78a745289..83685180e0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state,
 	die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append <KEY, VALUE> at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-	while (*buf) {
-		struct string_list_item *item;
-		char *np;
-		char *cp = strchr(buf, '=');
-		if (!cp) {
-			np = strchrnul(buf, '\n');
-			return error(_("unable to parse '%.*s'"),
-				     (int) (np - buf), buf);
-		}
-		np = strchrnul(cp, '\n');
-		*cp++ = '\0';
-		item = string_list_append(list, buf);
-
-		buf = np + (*np == '\n');
-		*np = '\0';
-		cp = sq_dequote(cp);
-		if (!cp)
-			return error(_("unable to dequote value of '%s'"),
-				     item->string);
-		item->util = xstrdup(cp);
-	}
-	return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
 	const char *filename = am_path(state, "author-script");
-	struct strbuf buf = STRBUF_INIT;
-	struct string_list kv = STRING_LIST_INIT_DUP;
-	int retval = -1; /* assume failure */
-	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-	int fd;
 
 	assert(!state->author_name);
 	assert(!state->author_email);
 	assert(!state->author_date);
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		return error_errno(_("could not open '%s' for reading"),
-				   filename);
-	}
-	strbuf_read(&buf, fd, 0);
-	close(fd);
-	if (parse_key_value_squoted(buf.buf, &kv))
-		goto finish;
-
-	for (i = 0; i < kv.nr; i++) {
-		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-			if (name_i != -2)
-				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
-			else
-				name_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-			if (email_i != -2)
-				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
-			else
-				email_i = i;
-		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-			if (date_i != -2)
-				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
-			else
-				date_i = i;
-		} else {
-			err = error(_("unknown variable '%s'"),
-				    kv.items[i].string);
-		}
-	}
-	if (name_i == -2)
-		error(_("missing 'GIT_AUTHOR_NAME'"));
-	if (email_i == -2)
-		error(_("missing 'GIT_AUTHOR_EMAIL'"));
-	if (date_i == -2)
-		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-		goto finish;
-	state->author_name = kv.items[name_i].util;
-	state->author_email = kv.items[email_i].util;
-	state->author_date = kv.items[date_i].util;
-	retval = 0;
-finish:
-	string_list_clear(&kv, !!retval);
-	strbuf_release(&buf);
-	return retval;
+	return read_author_script(filename, &state->author_name,
+				  &state->author_email, &state->author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index 6387c9ee6e..bf84a4f8ea 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -664,6 +664,111 @@ static int write_author_script(const char *message)
 	return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append <KEY, VALUE> at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+	while (*buf) {
+		struct string_list_item *item;
+		char *np;
+		char *cp = strchr(buf, '=');
+		if (!cp) {
+			np = strchrnul(buf, '\n');
+			return error(_("no key present in '%.*s'"),
+				     (int) (np - buf), buf);
+		}
+		np = strchrnul(cp, '\n');
+		*cp++ = '\0';
+		item = string_list_append(list, buf);
+
+		buf = np + (*np == '\n');
+		*np = '\0';
+		cp = sq_dequote(cp);
+		if (!cp)
+			return error(_("unable to dequote value of '%s'"),
+				     item->string);
+		item->util = xstrdup(cp);
+	}
+	return 0;
+}
+
+/**
+ * Reads and parses the state directory's "author-script" file, and sets name,
+ * email and date accordingly.
+ * Returns 0 on success, -1 if the file could not be parsed.
+ *
+ * The author script is of the format:
+ *
+ *	GIT_AUTHOR_NAME='$author_name'
+ *	GIT_AUTHOR_EMAIL='$author_email'
+ *	GIT_AUTHOR_DATE='$author_date'
+ *
+ * where $author_name, $author_email and $author_date are quoted. We are strict
+ * with our parsing, as the file was meant to be eval'd in the old
+ * git-am.sh/git-rebase--interactive.sh scripts, and thus if the file differs
+ * from what this function expects, it is better to bail out than to do
+ * something that the user does not expect.
+ */
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list kv = STRING_LIST_INIT_DUP;
+	int retval = -1; /* assume failure */
+	int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
+
+	if (strbuf_read_file(&buf, path, 256) <= 0) {
+		strbuf_release(&buf);
+		if (errno == ENOENT && allow_missing)
+			return 0;
+		else
+			return error_errno(_("could not open '%s' for reading"),
+					   path);
+	}
+
+	if (parse_key_value_squoted(buf.buf, &kv))
+		goto finish;
+
+	for (i = 0; i < kv.nr; i++) {
+		if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+			if (name_i != -2)
+				name_i = error(_("'GIT_AUTHOR_NAME' already given"));
+			else
+				name_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+			if (email_i != -2)
+				email_i = error(_("'GIT_AUTHOR_EMAIL' already given"));
+			else
+				email_i = i;
+		} else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+			if (date_i != -2)
+				date_i = error(_("'GIT_AUTHOR_DATE' already given"));
+			else
+				date_i = i;
+		} else {
+			err = error(_("unknown variable '%s'"),
+				    kv.items[i].string);
+		}
+	}
+	if (name_i == -2)
+		error(_("missing 'GIT_AUTHOR_NAME'"));
+	if (email_i == -2)
+		error(_("missing 'GIT_AUTHOR_EMAIL'"));
+	if (date_i == -2)
+		error(_("missing 'GIT_AUTHOR_DATE'"));
+	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+		goto finish;
+	*name = kv.items[name_i].util;
+	*email = kv.items[email_i].util;
+	*date = kv.items[date_i].util;
+	retval = 0;
+finish:
+	string_list_clear(&kv, !!retval);
+	strbuf_release(&buf);
+	return retval;
+}
 
 /*
  * write_author_script() used to fail to terminate the last line with a "'" and
diff --git a/sequencer.h b/sequencer.h
index c986bc8251..60f15a4d9c 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -114,4 +114,7 @@ void commit_post_rewrite(const struct commit *current_head,
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(const char *prefix, const struct object_id *oid,
 			  unsigned int flags);
+
+int read_author_script(const char *path, char **name, char **email, char **date,
+		       int allow_missing);
 #endif
-- 
2.19.1-708-g4ede3d42df


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 0/5] am/rebase: share read_author_script()
  2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
  2018-11-01  3:12     ` [PATCH v4fixed 2/5] am: improve author-script error reporting Junio C Hamano
@ 2018-11-01 16:27     ` Phillip Wood
  1 sibling, 0 replies; 41+ messages in thread
From: Phillip Wood @ 2018-11-01 16:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin, Phillip Wood

Hi Junio

On 01/11/2018 03:01, Junio C Hamano wrote:
> Phillip Wood <phillip.wood@talktalk.net> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Sorry for the confusion with v3, here are the updated patches.
>>
>> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
>> those comments, the rest are unchanged.
> 
> The mistake of overwriting -1 (i.e. earlier we detected dup) with
> the third instance of the same originates at [2/5], so updating
> [4/5] without fixing it at its source would mean [4/5] is not a pure
> code movement to make it available to libgit users---it instead hides
> a (not so important) bugfix in it.

Facepalm. Thanks for clearing up my mess, what you've got in pu looks 
fine to me.

Thanks

Phillip

> 
>>
>> v1 cover letter:
>>
>> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
>> duplication and improves rebase's parsing of the author script. After
>> this I'll do another series to share the code to write the author
>> script.
>>
>>
>> Phillip Wood (5):
>>    am: don't die in read_author_script()
>>    am: improve author-script error reporting
>>    am: rename read_author_script()
>>    add read_author_script() to libgit
>>    sequencer: use read_author_script()
>>
>>   builtin/am.c |  60 ++--------------
>>   sequencer.c  | 192 ++++++++++++++++++++++++++++++++-------------------
>>   sequencer.h  |   3 +
>>   3 files changed, 128 insertions(+), 127 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/5] am: improve author-script error reporting
  2018-10-31 10:15   ` [PATCH v4 2/5] am: improve author-script error reporting Phillip Wood
@ 2018-11-04  4:12     ` Eric Sunshine
  2018-11-05  1:53       ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2018-11-04  4:12 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Wed, Oct 31, 2018 at 6:16 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> diff --git a/builtin/am.c b/builtin/am.c
> @@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
> +       int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
> @@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
> +       for (i = 0; i < kv.nr; i++) {
> +               if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
> +                       if (name_i >= 0)
> +                               name_i = error(_("'GIT_AUTHOR_NAME' already given"));
> +                       else
> +                               name_i = i;
> +               }
> +               ...
> +       }
> +       if (name_i == -2)
> +               error(_("missing 'GIT_AUTHOR_NAME'"));
> +       ...
> +       if (date_i < 0 || email_i < 0 || date_i < 0 || err)
>                 goto finish;
> +       state->author_name = kv.items[name_i].util;
> +       ...
>         retval = 0;
>  finish:
>         string_list_clear(&kv, !!retval);

Junio labeled the "-2" trick "cute", and while it is optimal in that
it only traverses the key/value list once, it also increases cognitive
load since the reader has to spend a good deal more brain cycles
understanding what is going on than would be the case with simpler
(and less noisily repetitive) code.

An alternative would be to make the code trivially easy to understand,
though a bit more costly, but that expense should be negligible since
this file should always be tiny, containing very few key/value pairs,
and it's not timing critical code anyhow. For instance:

static char *find(struct string_list *kv, const char *key)
{
    const char *val = NULL;
    int i;
    for (i = 0; i < kv.nr; i++) {
        if (!strcmp(kv.items[i].string, key)) {
            if (val) {
                error(_("duplicate %s"), key);
                return NULL;
            }
            val = kv.items[i].util;
        }
    }
    if (!val)
        error(_("missing %s"), key);
    return val;
}

static int read_author_script(struct am_state *state)
{
    ...
    char *name, *email, *date;
    ...
    name = find(&kv, "GIT_AUTHOR_NAME");
    email = find(&kv, "GIT_AUTHOR_EMAIL");
    date = find(&kv, "GIT_AUTHOR_DATE");
    if (name && email && date) {
        state->author_name = name;
        state->author_email = email;
        state->author_date = date;
        retval = 0;
    }
    string_list_clear&kv, !!retval);
    ...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 2/5] am: improve author-script error reporting
  2018-11-04  4:12     ` Eric Sunshine
@ 2018-11-05  1:53       ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-11-05  1:53 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Phillip Wood, Git List, Johannes Schindelin

Eric Sunshine <sunshine@sunshineco.com> writes:

> Junio labeled the "-2" trick "cute", and while it is optimal in that
> it only traverses the key/value list once, it also increases cognitive
> load since the reader has to spend a good deal more brain cycles
> understanding what is going on than would be the case with simpler
> (and less noisily repetitive) code.

... and update three copies of very similar looking code that have
to stay similar.  If this were parsing unbounded and customizable
set of variables, I think the approach you suggest to use a helper
that iterates over the whole array for each key makes sense, but for
now I think what was queued would be OK.

> An alternative would be to make the code trivially easy to understand,
> though a bit more costly, but that expense should be negligible since
> this file should always be tiny, containing very few key/value pairs,
> and it's not timing critical code anyhow. For instance:
>
> static char *find(struct string_list *kv, const char *key)
> {
>     const char *val = NULL;
>     int i;
>     for (i = 0; i < kv.nr; i++) {
>         if (!strcmp(kv.items[i].string, key)) {
>             if (val) {
>                 error(_("duplicate %s"), key);
>                 return NULL;
>             }
>             val = kv.items[i].util;
>         }
>     }
>     if (!val)
>         error(_("missing %s"), key);
>     return val;
> }
>
> static int read_author_script(struct am_state *state)
> {
>     ...
>     char *name, *email, *date;
>     ...
>     name = find(&kv, "GIT_AUTHOR_NAME");
>     email = find(&kv, "GIT_AUTHOR_EMAIL");
>     date = find(&kv, "GIT_AUTHOR_DATE");
>     if (name && email && date) {
>         state->author_name = name;
>         state->author_email = email;
>         state->author_date = date;
>         retval = 0;
>     }
>     string_list_clear&kv, !!retval);
>     ...

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, back to index

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
2018-09-12 10:10 ` [PATCH 1/3] am: rename read_author_script() Phillip Wood
2018-09-12 10:10 ` [PATCH 2/3] add read_author_script() to libgit Phillip Wood
2018-09-13 23:49   ` Eric Sunshine
2018-10-10 10:14     ` Phillip Wood
2018-10-10 10:22       ` Eric Sunshine
2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
2018-09-12 12:14   ` Eric Sunshine
2018-09-14  0:31   ` Eric Sunshine
2018-10-10 10:35     ` Phillip Wood
2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-25  8:44     ` Junio C Hamano
2018-10-18 10:00   ` [PATCH v2 2/5] am: improve author-script error reporting Phillip Wood
2018-10-25  8:55     ` Junio C Hamano
2018-10-18 10:00   ` [PATCH v2 3/5] am: rename read_author_script() Phillip Wood
2018-10-18 10:00   ` [PATCH v2 4/5] add read_author_script() to libgit Phillip Wood
2018-10-18 10:00   ` [PATCH v2 5/5] sequencer: use read_author_script() Phillip Wood
2018-10-25  8:59   ` [PATCH v2 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-10-26  9:00     ` Phillip Wood
2018-10-26 13:32       ` Junio C Hamano
2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
2018-10-30 10:39   ` [PATCH v3 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-30 10:39   ` [PATCH v3 2/5] am: improve author-script error reporting Phillip Wood
2018-10-30 10:39   ` [PATCH v3 3/5] am: rename read_author_script() Phillip Wood
2018-10-30 10:39   ` [PATCH v3 4/5] add read_author_script() to libgit Phillip Wood
2018-10-30 10:39   ` [PATCH v3 5/5] sequencer: use read_author_script() Phillip Wood
2018-10-31  2:50   ` [PATCH v3 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-10-31 10:14     ` Phillip Wood
2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
2018-10-31 10:15   ` [PATCH v4 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-31 10:15   ` [PATCH v4 2/5] am: improve author-script error reporting Phillip Wood
2018-11-04  4:12     ` Eric Sunshine
2018-11-05  1:53       ` Junio C Hamano
2018-10-31 10:15   ` [PATCH v4 3/5] am: rename read_author_script() Phillip Wood
2018-10-31 10:15   ` [PATCH v4 4/5] add read_author_script() to libgit Phillip Wood
2018-10-31 10:15   ` [PATCH v4 5/5] sequencer: use read_author_script() Phillip Wood
2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-11-01  3:12     ` [PATCH v4fixed 2/5] am: improve author-script error reporting Junio C Hamano
2018-11-01  3:12       ` [PATCH v4fixed 4/5] add read_author_script() to libgit Junio C Hamano
2018-11-01 16:27     ` [PATCH v4 0/5] am/rebase: share read_author_script() Phillip Wood

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox