git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
@ 2018-07-31  7:33 Eric Sunshine
  2018-07-31  7:33 ` [PATCH v2 1/4] sequencer: fix "rebase -i --root" corrupting author header Eric Sunshine
                   ` (9 more replies)
  0 siblings, 10 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31  7:33 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Junio C Hamano, Phillip Wood, Akinori MUSHA,
	Eric Sunshine

This is a re-roll of [1] which fixes sequencer bugs resulting in commit
object corruption when "rebase -i --root" swaps in a new commit as root.
Unfortunately, those bugs made it into v2.18.0 and have already
corrupted at least one repository (a local project of mine). Patches 3/4
and 4/4 are new.

v1 fixed these bugs:

* trailing garbage on the commit's "author" header

* extra trailing digit on "author" header's timezone (caused by two
  separate bugs)

v2 fixes those same bugs, plus:

* eliminates a bogus "@" prepended to the "author" header timestamp
  which renders the header corrupt

* takes care to validate information coming from
  "rebase-merge/author-script" before incorporating it into the "author"
  header since that file may be hand-edited, and bogus hand-edited
  values could corrupt the commit object.

Patch 2/4 of this series conflicts with Akinori MUSHA's
'am/sequencer-author-script-fix' which takes a stab at fixing one of the
four (or so) bugs fixed by this series (namely, adding a missing closing
quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch
probably ought to be dropped (without prejudice) in favor of this series
for the following reasons:

* It has the undesirable property of adding an unwanted extra blank line
  to "rebase-merge/author-script"; this series doesn't make that
  mistake.

* The fix in this series (patch 2/4) is more complete, also ensuring
  that the return value of sq_dequote() is checked.

* And, most importantly, this series sells the change as a fix for a
  genuine serious bug (commit object corruption), whereas that patch
  talks only about adjusting the file content to make it parseable by
  the shell. It's important for future readers of this change to
  understand that the bug (missing closing quote) had much more dire
  consequences than merely being syntactically invalid as a shell
  script.

The test added by Akinori MUSHA's patch may still have value, and it may
make sense to re-submit it, however, doing so need not hold up this
(higher priority) series.

Phillip's proposed[2] unification of parsers and other fixes and
cleanups can easily be built atop this series and, likewise, need not
prevent these (higher priority) patches from graduating independently.

[1]: https://public-inbox.org/git/20180730092929.71114-1-sunshine@sunshineco.com/
[2]: https://public-inbox.org/git/1f172fc1-4b51-cdf7-e841-5ca41e209be4@talktalk.net/

Eric Sunshine (4):
  sequencer: fix "rebase -i --root" corrupting author header
  sequencer: fix "rebase -i --root" corrupting author header timezone
  sequencer: fix "rebase -i --root" corrupting author header timestamp
  sequencer: don't die() on bogus user-edited timestamp

 sequencer.c                   | 39 +++++++++++++++++++++--------------
 t/t3404-rebase-interactive.sh | 10 ++++++++-
 2 files changed, 33 insertions(+), 16 deletions(-)

Range-diff against v1:
1:  ba10ae4e5d ! 1:  8c47555bcf sequencer: fix "rebase -i --root" corrupting author header
    @@ -11,8 +11,8 @@
         This is a result of read_author_ident() neglecting to NUL-terminate the
         buffer into which it composes the "author" header.
     
    -    (Note that the extra "0" in the timezone is a separate bug which will be
    -    fixed subsequently.)
    +    (Note that the "@" preceding the timestamp and the extra "0" in the
    +    timezone are separate bugs which will be fixed subsequently.)
     
         Security considerations: Construction of the "author" header by
         read_author_ident() happens in-place and in parallel with parsing the
    @@ -26,6 +26,7 @@
         inserted as part of the parsing process.
     
         Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
    +    Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     
     diff --git a/sequencer.c b/sequencer.c
     --- a/sequencer.c
    @@ -61,7 +62,7 @@
     +	set_fake_editor &&
     +	FAKE_LINES="2 1" git rebase -i --root &&
     +	git cat-file commit HEAD^ >out &&
    -+	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out
    ++	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
     +'
     +
      test_done
2:  c0400cda85 ! 2:  1d4064147e sequencer: fix "rebase -i --root" corrupting author header timezone
    @@ -22,6 +22,9 @@
         a NUL-terminator at the end of the shifted content, which explains the
         duplicated last digit in the timezone.
     
    +    (Note that the "@" preceding the timestamp is a separate bug which
    +    will be fixed subsequently.)
    +
         Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
     
     diff --git a/sequencer.c b/sequencer.c
    @@ -56,8 +59,8 @@
      	set_fake_editor &&
      	FAKE_LINES="2 1" git rebase -i --root &&
      	git cat-file commit HEAD^ >out &&
    --	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out
    -+	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
    +-	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
    ++	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
      '
      
      test_done
-:  ---------- > 3:  cb65c7dd98 sequencer: fix "rebase -i --root" corrupting author header timestamp
-:  ---------- > 4:  3a624da9f4 sequencer: don't die() on bogus user-edited timestamp
-- 
2.18.0.267.gbc8be36ecb


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 1/4] sequencer: fix "rebase -i --root" corrupting author header
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
@ 2018-07-31  7:33 ` Eric Sunshine
  2018-07-31  7:33 ` [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone Eric Sunshine
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31  7:33 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Junio C Hamano, Phillip Wood, Akinori MUSHA,
	Eric Sunshine

When "git rebase -i --root" creates a new root commit (say, by swapping
in a different commit for the root), it corrupts the commit's "author"
header with trailing garbage:

    author A U Thor <author@example.com> @1112912773 -07000or@example.com

This is a result of read_author_ident() neglecting to NUL-terminate the
buffer into which it composes the "author" header.

(Note that the "@" preceding the timestamp and the extra "0" in the
timezone are separate bugs which will be fixed subsequently.)

Security considerations: Construction of the "author" header by
read_author_ident() happens in-place and in parallel with parsing the
content of "rebase-merge/author-script" which occupies the same buffer.
This is possible because the constructed "author" header is always
smaller than the content of "rebase-merge/author-script". Despite
neglecting to NUL-terminate the constructed "author" header, memory is
never accessed (either by read_author_ident() or its caller) beyond the
allocated buffer since a NUL-terminator is present at the end of the
loaded "rebase-merge/author-script" content, and additional NUL's are
inserted as part of the parsing process.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c                   |  2 +-
 t/t3404-rebase-interactive.sh | 10 +++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 16c1411054..78864d9072 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -744,7 +744,7 @@ static const char *read_author_ident(struct strbuf *buf)
 		return NULL;
 	}
 
-	buf->len = out - buf->buf;
+	strbuf_setlen(buf, out - buf->buf);
 	return buf->buf;
 }
 
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 01616901bd..d6e9b52740 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1238,7 +1238,7 @@ rebase_setup_and_clean () {
 		test_might_fail git branch -D $1 &&
 		test_might_fail git rebase --abort
 	" &&
-	git checkout -b $1 master
+	git checkout -b $1 ${2:-master}
 }
 
 test_expect_success 'drop' '
@@ -1415,4 +1415,12 @@ test_expect_success 'rebase -i --gpg-sign=<key-id> overrides commit.gpgSign' '
 	test_i18ngrep "$SQ-S\"S I Gner\"$SQ" err
 '
 
+test_expect_success 'valid author header after --root swap' '
+	rebase_setup_and_clean author-header no-conflict-branch &&
+	set_fake_editor &&
+	FAKE_LINES="2 1" git rebase -i --root &&
+	git cat-file commit HEAD^ >out &&
+	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
+'
+
 test_done
-- 
2.18.0.267.gbc8be36ecb


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
  2018-07-31  7:33 ` [PATCH v2 1/4] sequencer: fix "rebase -i --root" corrupting author header Eric Sunshine
@ 2018-07-31  7:33 ` Eric Sunshine
  2018-07-31  9:50   ` Phillip Wood
  2018-07-31  7:33 ` [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp Eric Sunshine
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31  7:33 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Junio C Hamano, Phillip Wood, Akinori MUSHA,
	Eric Sunshine

When "git rebase -i --root" creates a new root commit, it corrupts the
"author" header's timezone by repeating the last digit:

    author A U Thor <author@example.com> @1112912773 -07000

This is due to two bugs.

First, write_author_script() neglects to add the closing quote to the
value of GIT_AUTHOR_DATE when generating "rebase-merge/author-script".

Second, although sq_dequote() correctly diagnoses the missing closing
quote, read_author_ident() ignores sq_dequote()'s return value and
blindly uses the result of the aborted dequote.

sq_dequote() performs dequoting in-place by removing quoting and
shifting content downward. When it detects misquoting (lack of closing
quote, in this case), it gives up and returns an error without inserting
a NUL-terminator at the end of the shifted content, which explains the
duplicated last digit in the timezone.

(Note that the "@" preceding the timestamp is a separate bug which
will be fixed subsequently.)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 sequencer.c                   | 7 ++++++-
 t/t3404-rebase-interactive.sh | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 78864d9072..1008f6d71a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -654,6 +654,7 @@ static int write_author_script(const char *message)
 			strbuf_addch(&buf, *(message++));
 		else
 			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+	strbuf_addch(&buf, '\'');
 	res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
 	strbuf_release(&buf);
 	return res;
@@ -724,7 +725,11 @@ static const char *read_author_ident(struct strbuf *buf)
 
 		eol = strchrnul(in, '\n');
 		*eol = '\0';
-		sq_dequote(in);
+		if (!sq_dequote(in)) {
+			warning(_("bad quoting on %s value in '%s'"),
+				keys[i], rebase_path_author_script());
+			return NULL;
+		}
 		len = strlen(in);
 
 		if (i > 0) /* separate values by spaces */
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index d6e9b52740..fd3a18154e 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1420,7 +1420,7 @@ test_expect_success 'valid author header after --root swap' '
 	set_fake_editor &&
 	FAKE_LINES="2 1" git rebase -i --root &&
 	git cat-file commit HEAD^ >out &&
-	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
+	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
 '
 
 test_done
-- 
2.18.0.267.gbc8be36ecb


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
  2018-07-31  7:33 ` [PATCH v2 1/4] sequencer: fix "rebase -i --root" corrupting author header Eric Sunshine
  2018-07-31  7:33 ` [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone Eric Sunshine
@ 2018-07-31  7:33 ` Eric Sunshine
  2018-07-31 10:00   ` Phillip Wood
  2018-07-31  7:33 ` [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp Eric Sunshine
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31  7:33 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Junio C Hamano, Phillip Wood, Akinori MUSHA,
	Eric Sunshine

When "git rebase -i --root" creates a new root commit, it corrupts the
"author" header's timestamp by prepending a "@":

    author A U Thor <author@example.com> @1112912773 -0700

The commit parser is very strict about the format of the "author"
header, and does not allow a "@" in that position.

The "@" comes from GIT_AUTHOR_DATE in "rebase-merge/author-script",
signifying a Unix epoch-based timestamp, however, read_author_ident()
incorrectly allows it to slip into the commit's "author" header, thus
corrupting it.

One possible fix would be simply to filter out the "@" when constructing
the "author" header timestamp, however, a more correct fix is to parse
the GIT_AUTHOR_DATE date (via parse_date()) and format the parsed result
into the "author" header. Since "rebase-merge/author-script" may be
edited by the user, this approach has the extra benefit of catching
other potential timestamp corruption due to hand-editing.

We can do better than calling parse_date() ourselves and constructing
the "author" header manually, however, by instead taking advantage of
fmt_ident() which does this work for us.

The benefits of using fmt_ident() are twofold. First, it simplifies the
logic considerably by allowing us to avoid the complexity of building
the "author" header in parallel with and in the same buffer from which
"rebase-merge/author-script" is being parsed. Instead, fmt_ident() is
invoked to compose the header after parsing is complete.

Second, fmt_ident() is careful to prevent "crud" from polluting the
composed ident. As with validating GIT_AUTHOR_DATE, this "crud"
avoidance prevents other (possibly hand-edited) bogus author information
from "rebase-merge/author-script" from corrupting the commit object.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 sequencer.c                   | 23 +++++++++--------------
 t/t3404-rebase-interactive.sh |  2 +-
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 1008f6d71a..15a66a334c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -709,14 +709,16 @@ static const char *read_author_ident(struct strbuf *buf)
 	const char *keys[] = {
 		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
 	};
-	char *in, *out, *eol;
-	int i = 0, len;
+	struct strbuf out = STRBUF_INIT;
+	char *in, *eol;
+	const char *val[3];
+	int i = 0;
 
 	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
 		return NULL;
 
 	/* dequote values and construct ident line in-place */
-	for (in = out = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
+	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
 		if (!skip_prefix(in, keys[i], (const char **)&in)) {
 			warning("could not parse '%s' (looking for '%s'",
 				rebase_path_author_script(), keys[i]);
@@ -730,16 +732,7 @@ static const char *read_author_ident(struct strbuf *buf)
 				keys[i], rebase_path_author_script());
 			return NULL;
 		}
-		len = strlen(in);
-
-		if (i > 0) /* separate values by spaces */
-			*(out++) = ' ';
-		if (i == 1) /* email needs to be surrounded by <...> */
-			*(out++) = '<';
-		memmove(out, in, len);
-		out += len;
-		if (i == 1) /* email needs to be surrounded by <...> */
-			*(out++) = '>';
+		val[i] = in;
 		in = eol + 1;
 	}
 
@@ -749,7 +742,9 @@ static const char *read_author_ident(struct strbuf *buf)
 		return NULL;
 	}
 
-	strbuf_setlen(buf, out - buf->buf);
+	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
+	strbuf_swap(buf, &out);
+	strbuf_release(&out);
 	return buf->buf;
 }
 
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index fd3a18154e..d340018781 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1420,7 +1420,7 @@ test_expect_success 'valid author header after --root swap' '
 	set_fake_editor &&
 	FAKE_LINES="2 1" git rebase -i --root &&
 	git cat-file commit HEAD^ >out &&
-	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
+	grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
 '
 
 test_done
-- 
2.18.0.267.gbc8be36ecb


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (2 preceding siblings ...)
  2018-07-31  7:33 ` [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp Eric Sunshine
@ 2018-07-31  7:33 ` Eric Sunshine
  2018-07-31 10:02   ` Phillip Wood
  2018-07-31 10:05 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Phillip Wood
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31  7:33 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Junio C Hamano, Phillip Wood, Akinori MUSHA,
	Eric Sunshine

read_author_ident() is careful to handle errors "gently" when parsing
"rebase-merge/author-script" by printing a suitable warning and
returning NULL; it never die()'s. One possible reason that parsing might
fail is that "rebase-merge/author-script" has been hand-edited in such a
way which corrupts it or the information it contains.

However, read_author_ident() invokes fmt_ident() which is not so careful
about failing "gently". It will die() if it encounters a malformed
timestamp. Since read_author_ident() doesn't want to die() and since
it's dealing with possibly hand-edited data, take care to avoid passing
a bogus timestamp to fmt_ident().

A more "correctly engineered" fix would be to add a "gentle" version of
fmt_ident(), however, such a change it outside the scope of the bug-fix
series. If fmt_ident() ever does grow a "gentle" cousin, then the manual
timestamp check added here can be retired.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 sequencer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 15a66a334c..9b6cdb6ff8 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -742,6 +742,15 @@ static const char *read_author_ident(struct strbuf *buf)
 		return NULL;
 	}
 
+	/* validate date since fmt_ident() will die() on bad value */
+	if (parse_date(val[2], &out)){
+		warning(_("invalid date format '%s' in '%s'"),
+			val[2], rebase_path_author_script());
+		strbuf_release(&out);
+		return NULL;
+	}
+
+	strbuf_reset(&out);
 	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
 	strbuf_swap(buf, &out);
 	strbuf_release(&out);
-- 
2.18.0.267.gbc8be36ecb


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone
  2018-07-31  7:33 ` [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone Eric Sunshine
@ 2018-07-31  9:50   ` Phillip Wood
  2018-07-31 10:15     ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31  9:50 UTC (permalink / raw)
  To: Eric Sunshine, git; +Cc: Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On 31/07/18 08:33, Eric Sunshine wrote:
> When "git rebase -i --root" creates a new root commit, it corrupts the
> "author" header's timezone by repeating the last digit:
> 
>      author A U Thor <author@example.com> @1112912773 -07000
> 
> This is due to two bugs.
> 
> First, write_author_script() neglects to add the closing quote to the
> value of GIT_AUTHOR_DATE when generating "rebase-merge/author-script".
> 
> Second, although sq_dequote() correctly diagnoses the missing closing
> quote, read_author_ident() ignores sq_dequote()'s return value and
> blindly uses the result of the aborted dequote.
> 
> sq_dequote() performs dequoting in-place by removing quoting and
> shifting content downward. When it detects misquoting (lack of closing
> quote, in this case), it gives up and returns an error without inserting
> a NUL-terminator at the end of the shifted content, which explains the
> duplicated last digit in the timezone.
> 
> (Note that the "@" preceding the timestamp is a separate bug which
> will be fixed subsequently.)
> 
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>   sequencer.c                   | 7 ++++++-
>   t/t3404-rebase-interactive.sh | 2 +-
>   2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 78864d9072..1008f6d71a 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -654,6 +654,7 @@ static int write_author_script(const char *message)
>   			strbuf_addch(&buf, *(message++));
>   		else
>   			strbuf_addf(&buf, "'\\\\%c'", *(message++));
> +	strbuf_addch(&buf, '\'');
>   	res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
>   	strbuf_release(&buf);
>   	return res;
> @@ -724,7 +725,11 @@ static const char *read_author_ident(struct strbuf *buf)
>   
>   		eol = strchrnul(in, '\n');
>   		*eol = '\0';
> -		sq_dequote(in);
> +		if (!sq_dequote(in)) {
> +			warning(_("bad quoting on %s value in '%s'"),
> +				keys[i], rebase_path_author_script());
> +			return NULL;

Unfortunately the caller does not treat NULL as an error, so this will 
change the date and potentially the author of the commit. While that 
isn't corruption in the sense that it creates a sane commit, it does 
corrupt the author data compared to its expected value. I think it would 
be better to die in the short term, or fix the caller.

> +		}
>   		len = strlen(in);
>   
>   		if (i > 0) /* separate values by spaces */
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index d6e9b52740..fd3a18154e 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1420,7 +1420,7 @@ test_expect_success 'valid author header after --root swap' '
>   	set_fake_editor &&
>   	FAKE_LINES="2 1" git rebase -i --root &&
>   	git cat-file commit HEAD^ >out &&
> -	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
> +	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
>   '
>   
>   test_done
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp
  2018-07-31  7:33 ` [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp Eric Sunshine
@ 2018-07-31 10:00   ` Phillip Wood
  2018-07-31 10:30     ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 10:00 UTC (permalink / raw)
  To: Eric Sunshine, git; +Cc: Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On 31/07/18 08:33, Eric Sunshine wrote:
> When "git rebase -i --root" creates a new root commit, it corrupts the
> "author" header's timestamp by prepending a "@":
> 
>      author A U Thor <author@example.com> @1112912773 -0700
> 
> The commit parser is very strict about the format of the "author"
> header, and does not allow a "@" in that position.
> 
> The "@" comes from GIT_AUTHOR_DATE in "rebase-merge/author-script",
> signifying a Unix epoch-based timestamp, however, read_author_ident()
> incorrectly allows it to slip into the commit's "author" header, thus
> corrupting it.
> 
> One possible fix would be simply to filter out the "@" when constructing
> the "author" header timestamp, however, a more correct fix is to parse
> the GIT_AUTHOR_DATE date (via parse_date()) and format the parsed result
> into the "author" header. Since "rebase-merge/author-script" may be
> edited by the user, this approach has the extra benefit of catching
> other potential timestamp corruption due to hand-editing.
> 
> We can do better than calling parse_date() ourselves and constructing
> the "author" header manually, however, by instead taking advantage of
> fmt_ident() which does this work for us.
> 
> The benefits of using fmt_ident() are twofold. First, it simplifies the
> logic considerably by allowing us to avoid the complexity of building
> the "author" header in parallel with and in the same buffer from which
> "rebase-merge/author-script" is being parsed. Instead, fmt_ident() is
> invoked to compose the header after parsing is complete.
> 
> Second, fmt_ident() is careful to prevent "crud" from polluting the
> composed ident. As with validating GIT_AUTHOR_DATE, this "crud"
> avoidance prevents other (possibly hand-edited) bogus author information
> from "rebase-merge/author-script" from corrupting the commit object.
> 
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>   sequencer.c                   | 23 +++++++++--------------
>   t/t3404-rebase-interactive.sh |  2 +-
>   2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 1008f6d71a..15a66a334c 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -709,14 +709,16 @@ static const char *read_author_ident(struct strbuf *buf)
>   	const char *keys[] = {
>   		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
>   	};
> -	char *in, *out, *eol;
> -	int i = 0, len;
> +	struct strbuf out = STRBUF_INIT;
> +	char *in, *eol;
> +	const char *val[3];
> +	int i = 0;
>   
>   	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
>   		return NULL;
>   
>   	/* dequote values and construct ident line in-place */
> -	for (in = out = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
> +	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
>   		if (!skip_prefix(in, keys[i], (const char **)&in)) {
>   			warning("could not parse '%s' (looking for '%s'",
>   				rebase_path_author_script(), keys[i]);
> @@ -730,16 +732,7 @@ static const char *read_author_ident(struct strbuf *buf)
>   				keys[i], rebase_path_author_script());
>   			return NULL;
>   		}
> -		len = strlen(in);
> -
> -		if (i > 0) /* separate values by spaces */
> -			*(out++) = ' ';
> -		if (i == 1) /* email needs to be surrounded by <...> */
> -			*(out++) = '<';
> -		memmove(out, in, len);
> -		out += len;
> -		if (i == 1) /* email needs to be surrounded by <...> */
> -			*(out++) = '>';
> +		val[i] = in;
>   		in = eol + 1;
>   	}
>   
> @@ -749,7 +742,9 @@ static const char *read_author_ident(struct strbuf *buf)
>   		return NULL;
>   	}
>   
> -	strbuf_setlen(buf, out - buf->buf);
> +	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
> +	strbuf_swap(buf, &out);
> +	strbuf_release(&out);
>   	return buf->buf;
>   }

This is a welcome simplification

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index fd3a18154e..d340018781 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh

Now that the author is correct, can we test_cmp() it against its 
expected value to make sure there are no hidden surprises in the name 
and email in the future. (It would be reassuring to test an author with 
"'" in the name as well but that is out of scope for this series.)

> @@ -1420,7 +1420,7 @@ test_expect_success 'valid author header after --root swap' '
+	git cat-file commit HEAD^ |grep ^author >expected &&
>   	set_fake_editor &&
>   	FAKE_LINES="2 1" git rebase -i --root &&
>   	git cat-file commit HEAD^ >out &&
-	git cat-file commit HEAD^ >out &&
> -	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
+	git cat-file commit HEAD^ |grep ^author >out &&
+	test_cmp expected out	
> +	grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
>   ' >   test_done
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp
  2018-07-31  7:33 ` [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp Eric Sunshine
@ 2018-07-31 10:02   ` Phillip Wood
  2018-07-31 10:38     ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 10:02 UTC (permalink / raw)
  To: Eric Sunshine, git; +Cc: Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On 31/07/18 08:33, Eric Sunshine wrote:
> read_author_ident() is careful to handle errors "gently" when parsing
> "rebase-merge/author-script" by printing a suitable warning and
> returning NULL; it never die()'s. One possible reason that parsing might
> fail is that "rebase-merge/author-script" has been hand-edited in such a
> way which corrupts it or the information it contains.
> 
> However, read_author_ident() invokes fmt_ident() which is not so careful
> about failing "gently". It will die() if it encounters a malformed
> timestamp. Since read_author_ident() doesn't want to die() and since
> it's dealing with possibly hand-edited data, take care to avoid passing
> a bogus timestamp to fmt_ident().
> 
> A more "correctly engineered" fix would be to add a "gentle" version of
> fmt_ident(), however, such a change it outside the scope of the bug-fix
> series. If fmt_ident() ever does grow a "gentle" cousin, then the manual
> timestamp check added here can be retired.
> 
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>   sequencer.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 15a66a334c..9b6cdb6ff8 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -742,6 +742,15 @@ static const char *read_author_ident(struct strbuf *buf)
>   		return NULL;
>   	}
>   
> +	/* validate date since fmt_ident() will die() on bad value */
> +	if (parse_date(val[2], &out)){
> +		warning(_("invalid date format '%s' in '%s'"),
> +			val[2], rebase_path_author_script());
> +		strbuf_release(&out);
> +		return NULL;
> +	}

I think if you're going to do this then the caller needs to be changed 
to treat NULL as an error

> +	strbuf_reset(&out);
>   	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
>   	strbuf_swap(buf, &out);
>   	strbuf_release(&out);
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (3 preceding siblings ...)
  2018-07-31  7:33 ` [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp Eric Sunshine
@ 2018-07-31 10:05 ` Phillip Wood
  2018-07-31 10:46   ` Eric Sunshine
  2018-07-31 11:15 ` [PATCH v2 0/2] Fix author script quoting Phillip Wood
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 10:05 UTC (permalink / raw)
  To: Eric Sunshine, git; +Cc: Johannes Schindelin, Junio C Hamano, Akinori MUSHA

Hi Eric

On 31/07/18 08:33, Eric Sunshine wrote:
> This is a re-roll of [1] which fixes sequencer bugs resulting in commit
> object corruption when "rebase -i --root" swaps in a new commit as root.
> Unfortunately, those bugs made it into v2.18.0 and have already
> corrupted at least one repository (a local project of mine). Patches 3/4
> and 4/4 are new.
> 
> v1 fixed these bugs:
> 
> * trailing garbage on the commit's "author" header
> 
> * extra trailing digit on "author" header's timezone (caused by two
>    separate bugs)
> 
> v2 fixes those same bugs, plus:
> 
> * eliminates a bogus "@" prepended to the "author" header timestamp
>    which renders the header corrupt
> 
> * takes care to validate information coming from
>    "rebase-merge/author-script" before incorporating it into the "author"
>    header since that file may be hand-edited, and bogus hand-edited
>    values could corrupt the commit object.
> 
> Patch 2/4 of this series conflicts with Akinori MUSHA's
> 'am/sequencer-author-script-fix' which takes a stab at fixing one of the
> four (or so) bugs fixed by this series (namely, adding a missing closing
> quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch
> probably ought to be dropped (without prejudice) in favor of this series
> for the following reasons:
> 
> * It has the undesirable property of adding an unwanted extra blank line
>    to "rebase-merge/author-script"; this series doesn't make that
>    mistake.
> 
> * The fix in this series (patch 2/4) is more complete, also ensuring
>    that the return value of sq_dequote() is checked.
> 
> * And, most importantly, this series sells the change as a fix for a
>    genuine serious bug (commit object corruption), whereas that patch
>    talks only about adjusting the file content to make it parseable by
>    the shell. It's important for future readers of this change to
>    understand that the bug (missing closing quote) had much more dire
>    consequences than merely being syntactically invalid as a shell
>    script.
> 
> The test added by Akinori MUSHA's patch may still have value, and it may
> make sense to re-submit it, however, doing so need not hold up this
> (higher priority) series.

Yes I think it does, also the patch that Johannes and I have on top of 
it to fix the quoting of "'" in write_author_script() relies on fixing 
the missing trailing quote and handling of "'" at the same time, 
hopefully we can get that rebased on top of these asap.

Best Wishes

Phillip

> Phillip's proposed[2] unification of parsers and other fixes and
> cleanups can easily be built atop this series and, likewise, need not
> prevent these (higher priority) patches from graduating independently.
> 
> [1]: https://public-inbox.org/git/20180730092929.71114-1-sunshine@sunshineco.com/
> [2]: https://public-inbox.org/git/1f172fc1-4b51-cdf7-e841-5ca41e209be4@talktalk.net/
> 
> Eric Sunshine (4):
>    sequencer: fix "rebase -i --root" corrupting author header
>    sequencer: fix "rebase -i --root" corrupting author header timezone
>    sequencer: fix "rebase -i --root" corrupting author header timestamp
>    sequencer: don't die() on bogus user-edited timestamp
> 
>   sequencer.c                   | 39 +++++++++++++++++++++--------------
>   t/t3404-rebase-interactive.sh | 10 ++++++++-
>   2 files changed, 33 insertions(+), 16 deletions(-)
> 
> Range-diff against v1:
> 1:  ba10ae4e5d ! 1:  8c47555bcf sequencer: fix "rebase -i --root" corrupting author header
>      @@ -11,8 +11,8 @@
>           This is a result of read_author_ident() neglecting to NUL-terminate the
>           buffer into which it composes the "author" header.
>       
>      -    (Note that the extra "0" in the timezone is a separate bug which will be
>      -    fixed subsequently.)
>      +    (Note that the "@" preceding the timestamp and the extra "0" in the
>      +    timezone are separate bugs which will be fixed subsequently.)
>       
>           Security considerations: Construction of the "author" header by
>           read_author_ident() happens in-place and in parallel with parsing the
>      @@ -26,6 +26,7 @@
>           inserted as part of the parsing process.
>       
>           Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
>      +    Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>       
>       diff --git a/sequencer.c b/sequencer.c
>       --- a/sequencer.c
>      @@ -61,7 +62,7 @@
>       +	set_fake_editor &&
>       +	FAKE_LINES="2 1" git rebase -i --root &&
>       +	git cat-file commit HEAD^ >out &&
>      -+	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out
>      ++	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
>       +'
>       +
>        test_done
> 2:  c0400cda85 ! 2:  1d4064147e sequencer: fix "rebase -i --root" corrupting author header timezone
>      @@ -22,6 +22,9 @@
>           a NUL-terminator at the end of the shifted content, which explains the
>           duplicated last digit in the timezone.
>       
>      +    (Note that the "@" preceding the timestamp is a separate bug which
>      +    will be fixed subsequently.)
>      +
>           Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
>       
>       diff --git a/sequencer.c b/sequencer.c
>      @@ -56,8 +59,8 @@
>        	set_fake_editor &&
>        	FAKE_LINES="2 1" git rebase -i --root &&
>        	git cat-file commit HEAD^ >out &&
>      --	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9]*$" out
>      -+	grep "^author ..* @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
>      +-	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9]*$" out
>      ++	grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
>        '
>        
>        test_done
> -:  ---------- > 3:  cb65c7dd98 sequencer: fix "rebase -i --root" corrupting author header timestamp
> -:  ---------- > 4:  3a624da9f4 sequencer: don't die() on bogus user-edited timestamp
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone
  2018-07-31  9:50   ` Phillip Wood
@ 2018-07-31 10:15     ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 10:15 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On Tue, Jul 31, 2018 at 5:50 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 31/07/18 08:33, Eric Sunshine wrote:
> > -             sq_dequote(in);
> > +             if (!sq_dequote(in)) {
> > +                     warning(_("bad quoting on %s value in '%s'"),
> > +                             keys[i], rebase_path_author_script());
> > +                     return NULL;
>
> Unfortunately the caller does not treat NULL as an error, so this will
> change the date and potentially the author of the commit. While that
> isn't corruption in the sense that it creates a sane commit, it does
> corrupt the author data compared to its expected value. I think it would
> be better to die in the short term, or fix the caller.

Although the caller may not treat NULL as an _error_, it is
nevertheless prepared to handle NULL. Moreover, this new code mirrors
existing behavior in read_author_ident() in which it already returns
NULL for other errors encountered during the parse, so I think
returning NULL is the correct thing to do within the scope of this
patch series, especially considering that this series is about fixing
genuine commit object corruption from which a typical will be unable
to recover (and may not even notice until some time in the future).

While the unexpected change in value you describe is unfortunate, this
patch neither helps nor hurts that since the problem already exists in
that other parts of this function already return NULL, and since the
junk value used by read_author_ident() before this patch was already
an "unexpected change". More importantly, the commit object itself is
not corrupted by this issue, and this sort of "error" is something
from which a typical user is likely to be able to recover, so IMHO is
outside the scope of the critical bug fixes of this series. Fixing
that issue can easily be done on top of this series.

Thanks for thinking critically about this.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp
  2018-07-31 10:00   ` Phillip Wood
@ 2018-07-31 10:30     ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 10:30 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On Tue, Jul 31, 2018 at 6:01 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Now that the author is correct, can we test_cmp() it against its
> expected value to make sure there are no hidden surprises in the name
> and email in the future. (It would be reassuring to test an author with
> "'" in the name as well but that is out of scope for this series.)
>
> +       git cat-file commit HEAD^ |grep ^author >expected &&
> >       set_fake_editor &&
> >       FAKE_LINES="2 1" git rebase -i --root &&
> >       git cat-file commit HEAD^ >out &&
> -       git cat-file commit HEAD^ >out &&
> > -     grep "^author ..*> @[0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
> +       git cat-file commit HEAD^ |grep ^author >out &&
> +       test_cmp expected out
> > +     grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
> >   ' >   test_done

I deliberately avoided test_cmp() in favor of 'grep' specifically
because I didn't want to hard-code the timestamp into the 'expect'
file since future changes to tests preceding this one could
potentially outdate the hard-coded value, and setting up my own commit
to ensure a consistent timestamp would have made this test longer and
less "obvious".

However, your approach sidesteps those concerns nicely. That said, I
think such a simplification could be done on top of this patch since
the current change to the test in this patch makes it very clear (to
the reader) that the "@" problem has been corrected, whereas it would
not be at all obvious if this patch incorporated your simplification.

While your idea is nice, I'd rather not re-roll this series just for
that. (I'd really like to see these fixes for this critical commit
object corruption land as soon as possible without re-rolling
repeatedly for "optional" or less important changes.) Perhaps such a
simplification can be done in the series you're working on(?).

Thanks for the review.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp
  2018-07-31 10:02   ` Phillip Wood
@ 2018-07-31 10:38     ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 10:38 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On Tue, Jul 31, 2018 at 6:02 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 31/07/18 08:33, Eric Sunshine wrote:
> > +     /* validate date since fmt_ident() will die() on bad value */
> > +     if (parse_date(val[2], &out)){
> > +             warning(_("invalid date format '%s' in '%s'"),
> > +                     val[2], rebase_path_author_script());
> > +             strbuf_release(&out);
> > +             return NULL;
> > +     }
>
> I think if you're going to do this then the caller needs to be changed
> to treat NULL as an error

I humbly disagree for the reasons outlined in my response[1] to your
review of 2/4. Summarized, this function already returns NULL in
several cases upon error, so the change made by this patch is
consistent with existing behavior (and certainly doesn't make the
situation worse). This patch series is about fixing the critical bugs
resulting in genuine commit object corruption; it's not about
re-engineering a case which was perhaps not thought through thoroughly
(changing commit data unexpectedly) but which does not genuinely
corrupt the commit object itself.

[1]: https://public-inbox.org/git/CAPig+cRu6fi_SG7LeTBjAWG8aoCT7LmSipDq2a9bPR0_Ae1pFg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31 10:05 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Phillip Wood
@ 2018-07-31 10:46   ` Eric Sunshine
  2018-07-31 11:19     ` Phillip Wood
  2018-07-31 11:27     ` Eric Sunshine
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 10:46 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On Tue, Jul 31, 2018 at 6:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 31/07/18 08:33, Eric Sunshine wrote:
> > Patch 2/4 of this series conflicts with Akinori MUSHA's
> > 'am/sequencer-author-script-fix' which takes a stab at fixing one of the
> > four (or so) bugs fixed by this series (namely, adding a missing closing
> > quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch
> > probably ought to be dropped (without prejudice) in favor of this series
> > for the following reasons:
> > [...]
> > The test added by Akinori MUSHA's patch may still have value, and it may
> > make sense to re-submit it, however, doing so need not hold up this
> > (higher priority) series.
>
> Yes I think it does, also the patch that Johannes and I have on top of
> it to fix the quoting of "'" in write_author_script() relies on fixing
> the missing trailing quote and handling of "'" at the same time,
> hopefully we can get that rebased on top of these asap.

I'm not sure if "Yes I think it does" means that Akinori's test has
value or if it means that this series should be held back waiting for
other changes built atop it.

Anyhow, thanks for reading over the series. I appreciate it even if
our "sense of priority" doesn't always align (as evidenced by your
review comments and my responses).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 0/2] Fix author script quoting
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (4 preceding siblings ...)
  2018-07-31 10:05 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Phillip Wood
@ 2018-07-31 11:15 ` Phillip Wood
  2018-07-31 11:15   ` [PATCH v2 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
  2018-07-31 11:15   ` [PATCH v2 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  2018-08-01  1:30 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Hilco Wijbenga
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 11:15 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine, Johannes Schindelin,
	Junio C Hamano
  Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

These build on Eric's patches. The first patch would be better if it
was integrated into Eric's patches. The second is a rebased version of
my previous patch for fixing quoting in the author script with some
additions by Johannes and rebased on top of Eric's fixes. The special
casing for backwards compatibility in the second patch relies on the
quoting and the missing trailing "'" from GIT_AUTHOR_DATE being fixed
concurrently.


Phillip Wood (2):
  sequencer: handle errors in read_author_ident()
  sequencer: fix quoting in write_author_script

 sequencer.c                   | 72 +++++++++++++++++++++--------------
 t/t3404-rebase-interactive.sh | 17 ++++++++-
 2 files changed, 59 insertions(+), 30 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 1/2] sequencer: handle errors in read_author_ident()
  2018-07-31 11:15 ` [PATCH v2 0/2] Fix author script quoting Phillip Wood
@ 2018-07-31 11:15   ` Phillip Wood
  2018-07-31 20:47     ` Eric Sunshine
  2018-07-31 11:15   ` [PATCH v2 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 11:15 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine, Johannes Schindelin,
	Junio C Hamano
  Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The calling code treated NULL as a valid return value, so fix this by
returning and integer and passing in a parameter to receive the author.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 49 ++++++++++++++++++++++++++-----------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 944dea6997..212b912420 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -701,57 +701,58 @@ static char *get_author(const char *message)
 }
 
 /* Read author-script and return an ident line (author <email> timestamp) */
-static const char *read_author_ident(struct strbuf *buf)
+static int read_author_ident(char **author)
 {
 	const char *keys[] = {
 		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
 	};
+	struct strbuf buf = STRBUF_INIT;
 	struct strbuf out = STRBUF_INIT;
 	char *in, *eol;
 	const char *val[3];
 	int i = 0;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
-		return NULL;
+	if (strbuf_read_file(&buf, rebase_path_author_script(), 256) <= 0)
+		return -1;
 
 	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
+	for (in = buf.buf; i < 3 && in - buf.buf < buf.len; i++) {
 		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning("could not parse '%s' (looking for '%s'",
-				rebase_path_author_script(), keys[i]);
-			return NULL;
+			strbuf_release(&buf);
+			return error("could not parse '%s' (looking for '%s'",
+				     rebase_path_author_script(), keys[i]);
 		}
-
 		eol = strchrnul(in, '\n');
 		*eol = '\0';
 		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
+			strbuf_release(&buf);
+			return error(_("bad quoting on %s value in '%s'"),
+				     keys[i], rebase_path_author_script());
 		}
 		val[i] = in;
 		in = eol + 1;
 	}
 
 	if (i < 3) {
-		warning("could not parse '%s' (looking for '%s')",
-			rebase_path_author_script(), keys[i]);
-		return NULL;
+		strbuf_release(&buf);
+		return error("could not parse '%s' (looking for '%s')",
+			     rebase_path_author_script(), keys[i]);
 	}
 
 	/* validate date since fmt_ident() will die() on bad value */
 	if (parse_date(val[2], &out)){
-		warning(_("invalid date format '%s' in '%s'"),
+		error(_("invalid date format '%s' in '%s'"),
 			val[2], rebase_path_author_script());
 		strbuf_release(&out);
-		return NULL;
+		strbuf_release(&buf);
+		return -1;
 	}
 
 	strbuf_reset(&out);
 	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
-	strbuf_swap(buf, &out);
-	strbuf_release(&out);
-	return buf->buf;
+	*author = strbuf_detach(&out, NULL);
+	strbuf_release(&buf);
+	return 0;
 }
 
 static const char staged_changes_advice[] =
@@ -794,12 +795,14 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 	const char *value;
 
 	if ((flags & CREATE_ROOT_COMMIT) && !(flags & AMEND_MSG)) {
-		struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
-		const char *author = is_rebase_i(opts) ?
-			read_author_ident(&script) : NULL;
+		struct strbuf msg = STRBUF_INIT;
+		char *author = NULL;
 		struct object_id root_commit, *cache_tree_oid;
 		int res = 0;
 
+		if (is_rebase_i(opts) && read_author_ident(&author))
+			return -1;
+
 		if (!defmsg)
 			BUG("root commit without message");
 
@@ -817,7 +820,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 					  opts->gpg_sign);
 
 		strbuf_release(&msg);
-		strbuf_release(&script);
+		free(author);
 		if (!res) {
 			update_ref(NULL, "CHERRY_PICK_HEAD", &root_commit, NULL,
 				   REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-07-31 11:15 ` [PATCH v2 0/2] Fix author script quoting Phillip Wood
  2018-07-31 11:15   ` [PATCH v2 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
@ 2018-07-31 11:15   ` Phillip Wood
  2018-07-31 21:39     ` Eric Sunshine
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 11:15 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine, Johannes Schindelin,
	Junio C Hamano
  Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Single quotes should be escaped as \' not \\'. Note that this only
affects authors that contain a single quote and then only external
scripts that read the author script and users whose git is upgraded from
the shell version of rebase -i while rebase was stopped. This is because
the parsing in read_env_script() expected the broken version and for
some reason sq_dequote() called by read_author_ident() seems to handle
the broken quoting correctly.

Ideally rebase and am would share the same code for reading and
writing the author script, but this commit just fixes the immediate
bug.

The test is based on an original by Akinori MUSHA which was designed to
check that the author script was parsable by posix shell.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c                   | 23 +++++++++++++++++------
 t/t3404-rebase-interactive.sh | 17 ++++++++++++++++-
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 212b912420..b6e17c1c32 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -636,21 +636,21 @@ static int write_author_script(const char *message)
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_EMAIL='");
 	while (*message && *message != '\n' && *message != '\r')
 		if (skip_prefix(message, "> ", &message))
 			break;
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_DATE='@");
 	while (*message && *message != '\n' && *message != '\r')
 		if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addch(&buf, '\'');
 	res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
 	strbuf_release(&buf);
@@ -664,14 +664,25 @@ static int write_author_script(const char *message)
 static int read_env_script(struct argv_array *env)
 {
 	struct strbuf script = STRBUF_INIT;
-	int i, count = 0;
-	char *p, *p2;
+	int i, count = 0, sq_bug;
+	const char *p2;
+	char *p;
 
 	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
 		return -1;
 
+	/*
+	 * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
+	 * line with a "'" and also escaped "'" incorrectly as "'\\\\''" rather
+	 * than "'\\''". We check for the terminating "'" on the last line to
+	 * see how "'" has been escaped in case git was upgraded while rebase
+	 * was stopped.
+	 */
+	sq_bug = script.len && script.buf[script.len - 2] != '\'';
 	for (p = script.buf; *p; p++)
-		if (skip_prefix(p, "'\\\\''", (const char **)&p2))
+		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
+			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
+		else if (skip_prefix(p, "'\\''", &p2))
 			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
 		else if (*p == '\'')
 			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index b72167ecd5..ae9036d4d9 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
 	test_line_count = 6 actual
 '
 
+SQ="'"
+test_expect_success 'rebase -i writes correct author-script' '
+	test_when_finished "test_might_fail git rebase --abort" &&
+	git checkout -b author-with-sq master &&
+	GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
+	set_fake_editor &&
+	FAKE_LINES="edit 1" git rebase -ki HEAD^ &&
+	tail -n1 .git/rebase-merge/author-script | grep "$SQ\$" &&
+	(
+		sane_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
+		. .git/rebase-merge/author-script &&
+		test "$(git show -s --date=raw --format=%an,%ae,@%ad)" = \
+			"Auth O$SQ R,$GIT_AUTHOR_EMAIL,$GIT_AUTHOR_DATE"
+	)
+'
+
 test_expect_success 'rebase -i with the exec command' '
 	git checkout master &&
 	(
@@ -1361,7 +1377,6 @@ test_expect_success 'editor saves as CR/LF' '
 	)
 '
 
-SQ="'"
 test_expect_success 'rebase -i --gpg-sign=<key-id>' '
 	test_when_finished "test_might_fail git rebase --abort" &&
 	set_fake_editor &&
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31 10:46   ` Eric Sunshine
@ 2018-07-31 11:19     ` Phillip Wood
  2018-07-31 11:27     ` Eric Sunshine
  1 sibling, 0 replies; 57+ messages in thread
From: Phillip Wood @ 2018-07-31 11:19 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood
  Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

Hi Eric

On 31/07/18 11:46, Eric Sunshine wrote:
> On Tue, Jul 31, 2018 at 6:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> On 31/07/18 08:33, Eric Sunshine wrote:
>>> Patch 2/4 of this series conflicts with Akinori MUSHA's
>>> 'am/sequencer-author-script-fix' which takes a stab at fixing one of the
>>> four (or so) bugs fixed by this series (namely, adding a missing closing
>>> quote to GIT_AUTHOR_DATE in "rebase-merge/author-script"). That patch
>>> probably ought to be dropped (without prejudice) in favor of this series
>>> for the following reasons:
>>> [...]
>>> The test added by Akinori MUSHA's patch may still have value, and it may
>>> make sense to re-submit it, however, doing so need not hold up this
>>> (higher priority) series.
>>
>> Yes I think it does, also the patch that Johannes and I have on top of
>> it to fix the quoting of "'" in write_author_script() relies on fixing
>> the missing trailing quote and handling of "'" at the same time,
>> hopefully we can get that rebased on top of these asap.
> 
> I'm not sure if "Yes I think it does" means that Akinori's test has
> value or if it means that this series should be held back waiting for
> other changes built atop it.

It means we should use a test based on that with the quoting fixes on
top of this series and progress them together if possible. I think the
quoting patch I just sent is good now so hopefully there wont be any
issue with it holding up your fixes.

> Anyhow, thanks for reading over the series. I appreciate it even if
> our "sense of priority" doesn't always align (as evidenced by your
> review comments and my responses).

My "sense of priority" was informed by the hope that the quoting patch
wouldn't hold things up (let's hope I was right about that!).

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31 10:46   ` Eric Sunshine
  2018-07-31 11:19     ` Phillip Wood
@ 2018-07-31 11:27     ` Eric Sunshine
  1 sibling, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 11:27 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano, Akinori MUSHA

On Tue, Jul 31, 2018 at 6:46 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> Anyhow, thanks for reading over the series. I appreciate it even if
> our "sense of priority" doesn't always align (as evidenced by your
> review comments and my responses).

To be clear, the changes you suggest all make sense, and would be
welcome (especially the bug fixes), but I consider them lower priority
than the fixes in this series, and here's why:

The commit object corruption caused by the bugs fixed by this series
are unavoidable. Anyone using "rebase -i --root" to swap in a new
commit as root is going to end up with a corrupt repository no matter
what. There's no way to side step it. And, most people won't know how
to fix the corruption, assuming they even notice it.

If I understand correctly, the issues you describe are unlikely to
come up in practice. The only way they can arise is if someone hand
edits the script (something only power users will do) _and_ botches
the edit in the process, or if the person's name contains an
apostrophe (possible, though perhaps uncommon?). Also, (if again I
understand correctly) they are only data "corruptions", not genuine
broken-repository corruptions, thus are more likely to be fixable by a
typical user.

So it's not that I think your proposed fixes and suggestions are
unimportant, I just don't think they belong in this series, and would
be happy to see them atop it.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/2] sequencer: handle errors in read_author_ident()
  2018-07-31 11:15   ` [PATCH v2 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
@ 2018-07-31 20:47     ` Eric Sunshine
  2018-08-01  9:28       ` Phillip Wood
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 20:47 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> The calling code treated NULL as a valid return value, so fix this by
> returning and integer and passing in a parameter to receive the author.

It might be difficult for future readers (those who didn't follow the
discussion) to understand how/why NULL is not sufficient to signal an
error. Perhaps incorporating the explanation from your email[1] which
discussed that the author name, email, and/or date might change
unexpectedly would be sufficient. This excerpt from [1] might be a
good starting point:

    ... the caller does not treat NULL as an error, so this will
    change the date and potentially the author of the commit
    ... [which] does corrupt the author data compared to its
    expected value.

[1]: https://public-inbox.org/git/c80cf729-1bbe-10f5-6837-b074d371b91c@talktalk.net/

> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -701,57 +701,58 @@ static char *get_author(const char *message)
> -static const char *read_author_ident(struct strbuf *buf)
> +static int read_author_ident(char **author)

So, the caller is now responsible for freeing the string placed in
'author'. Okay.

>  {
> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> -               return NULL;
> +       if (strbuf_read_file(&buf, rebase_path_author_script(), 256) <= 0)
> +               return -1;

I think you need to strbuf_release(&buf) in this error path since
strbuf_read_file() doesn't guarantee that the strbuf hasn't been
partially populated when it returns an error. (That is, this is
leaking.)

>         /* dequote values and construct ident line in-place */

Ugh, this comment should have been adjusted in my series. A minor
matter, though, which can be tweaked later.

>         /* validate date since fmt_ident() will die() on bad value */
>         if (parse_date(val[2], &out)){
> -               warning(_("invalid date format '%s' in '%s'"),
> +               error(_("invalid date format '%s' in '%s'"),
>                         val[2], rebase_path_author_script());
>                 strbuf_release(&out);
> -               return NULL;
> +               strbuf_release(&buf);
> +               return -1;

You were careful to print the error, which references a value from
'buf', before destroying 'buf'. Good.

(A simplifying alternative would have been to not print the actual
value and instead say generally that "the date" was bad. Not a big
deal.)

>         }
> -       strbuf_swap(buf, &out);
> -       strbuf_release(&out);
> -       return buf->buf;
> +       *author = strbuf_detach(&out, NULL);

And, 'author' is only assigned when 0 is returned, so the caller only
has to free(author) upon success. Fine.

> +       strbuf_release(&buf);
> +       return 0;
>  }
>
>  static const char staged_changes_advice[] =
> @@ -794,12 +795,14 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
> -               struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
> -               const char *author = is_rebase_i(opts) ?
> -                       read_author_ident(&script) : NULL;
> +               struct strbuf msg = STRBUF_INIT;
> +               char *author = NULL;
>                 struct object_id root_commit, *cache_tree_oid;
>                 int res = 0;
>
> +               if (is_rebase_i(opts) && read_author_ident(&author))
> +                       return -1;

Logic looks correct, and it's nice to see that you went with 'return
-1' rather than die(), especially since the caller of run_git_commit()
is already able to handle -1.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-07-31 11:15   ` [PATCH v2 2/2] sequencer: fix quoting in write_author_script Phillip Wood
@ 2018-07-31 21:39     ` Eric Sunshine
  2018-08-01 10:24       ` Phillip Wood
  2018-08-01 15:50       ` Phillip Wood
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-07-31 21:39 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Single quotes should be escaped as \' not \\'. Note that this only
> affects authors that contain a single quote and then only external
> scripts that read the author script and users whose git is upgraded from
> the shell version of rebase -i while rebase was stopped. This is because
> the parsing in read_env_script() expected the broken version and for
> some reason sq_dequote() called by read_author_ident() seems to handle
> the broken quoting correctly.

Is the:

    ...for some reason sq_dequote() called by read_author_ident()
    seems to handle the broken quoting correctly.

bit outdated? We know now from patch 2/4 of my series[1] that
read_author_ident() wasn't handling it correctly at all. It was merely
ignoring the return value from sq_dequote() and using whatever broken
value came back from it.

[1]: https://public-inbox.org/git/20180731073331.40007-3-sunshine@sunshineco.com/

> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -664,14 +664,25 @@ static int write_author_script(const char *message)
>  static int read_env_script(struct argv_array *env)
>  {
>         if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>                 return -1;

This is not a problem introduced by this patch, but since
strbuf_read_file() doesn't guarantee that memory hasn't been allocated
when it returns an error, this is leaking.

> +       /*
> +        * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
> +        * line with a "'" and also escaped "'" incorrectly as "'\\\\''" rather
> +        * than "'\\''". We check for the terminating "'" on the last line to
> +        * see how "'" has been escaped in case git was upgraded while rebase
> +        * was stopped.
> +        */
> +       sq_bug = script.len && script.buf[script.len - 2] != '\'';

I think you need to be checking 'script.len > 1', not just
'script.len', otherwise you might access memory outside the allocated
buffer.

This is a very "delicate" check, assuming that a hand-edited file
won't end with, say, an extra newline. I wonder if this level of
backward-compatibility is overkill for such an unlikely case.

>         for (p = script.buf; *p; p++)
> -               if (skip_prefix(p, "'\\\\''", (const char **)&p2))
> +               if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
> +                       strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
> +               else if (skip_prefix(p, "'\\''", &p2))
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> @@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
> +test_expect_success 'rebase -i writes correct author-script' '
> +       test_when_finished "test_might_fail git rebase --abort" &&
> +       git checkout -b author-with-sq master &&
> +       GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
> +       set_fake_editor &&
> +       FAKE_LINES="edit 1" git rebase -ki HEAD^ &&

Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (5 preceding siblings ...)
  2018-07-31 11:15 ` [PATCH v2 0/2] Fix author script quoting Phillip Wood
@ 2018-08-01  1:30 ` Hilco Wijbenga
  2018-08-01  6:22   ` Eric Sunshine
  2018-08-01 23:25 ` brian m. carlson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Hilco Wijbenga @ 2018-08-01  1:30 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git Users, Johannes Schindelin, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

Hi Eric,

On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> This is a re-roll of [1] which fixes sequencer bugs resulting in commit
> object corruption when "rebase -i --root" swaps in a new commit as root.
> Unfortunately, those bugs made it into v2.18.0 and have already
> corrupted at least one repository (a local project of mine). Patches 3/4
> and 4/4 are new.
>
> v1 fixed these bugs:
>
> * trailing garbage on the commit's "author" header
>
> * extra trailing digit on "author" header's timezone (caused by two
>   separate bugs)
>
> v2 fixes those same bugs, plus:
>
> * eliminates a bogus "@" prepended to the "author" header timestamp
>   which renders the header corrupt
>
> * takes care to validate information coming from
>   "rebase-merge/author-script" before incorporating it into the "author"
>   header since that file may be hand-edited, and bogus hand-edited
>   values could corrupt the commit object.
>

Does this also fix losing the initial commit if it is empty?

Given

git init ; git commit -m 'Initial commit' --allow-empty ; touch
file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git
rebase --root

I would expect there to be 2 commits but the first one has
disappeared. (This usually happens with "git rebase -i --root" early
on in a new project.)

Cheers,
Hilco

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-01  1:30 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Hilco Wijbenga
@ 2018-08-01  6:22   ` Eric Sunshine
  2018-08-07  1:19     ` Hilco Wijbenga
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-01  6:22 UTC (permalink / raw)
  To: Hilco Wijbenga
  Cc: Git List, Johannes Schindelin, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

On Tue, Jul 31, 2018 at 9:30 PM Hilco Wijbenga <hilco.wijbenga@gmail.com> wrote:
> On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> > This is a re-roll of [1] which fixes sequencer bugs resulting in commit
> > object corruption when "rebase -i --root" swaps in a new commit as root.
> > Unfortunately, those bugs made it into v2.18.0 and have already
> > corrupted at least one repository (a local project of mine). Patches 3/4
> > and 4/4 are new.
>
> Does this also fix losing the initial commit if it is empty?
>
> git init ; git commit -m 'Initial commit' --allow-empty ; touch
> file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git
> rebase --root
>
> I would expect there to be 2 commits but the first one has
> disappeared. (This usually happens with "git rebase -i --root" early
> on in a new project.)

This series should have no impact on that issue; it is fixing root
commit object corruption, and does not touch or change how the commit
graph is maintained or how the rebasing machinery itself works (and
the --allow-empty stuff is part of that machinery).

As Dscho is cc:'d already, perhaps he can chime in as a resident expert.

By the way, what happens if you use --keep-empty while rebasing?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/2] sequencer: handle errors in read_author_ident()
  2018-07-31 20:47     ` Eric Sunshine
@ 2018-08-01  9:28       ` Phillip Wood
  0 siblings, 0 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-01  9:28 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

Hi Eric

Thanks for taking a look at this

On 31/07/18 21:47, Eric Sunshine wrote:
> On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> The calling code treated NULL as a valid return value, so fix this by
>> returning and integer and passing in a parameter to receive the author.
> 
> It might be difficult for future readers (those who didn't follow the
> discussion) to understand how/why NULL is not sufficient to signal an
> error. Perhaps incorporating the explanation from your email[1] which
> discussed that the author name, email, and/or date might change
> unexpectedly would be sufficient. This excerpt from [1] might be a
> good starting point:
> 
>      ... the caller does not treat NULL as an error, so this will
>      change the date and potentially the author of the commit
>      ... [which] does corrupt the author data compared to its
>      expected value.
> 
> [1]: https://public-inbox.org/git/c80cf729-1bbe-10f5-6837-b074d371b91c@talktalk.net/
> 
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>> diff --git a/sequencer.c b/sequencer.c
>> @@ -701,57 +701,58 @@ static char *get_author(const char *message)
>> -static const char *read_author_ident(struct strbuf *buf)
>> +static int read_author_ident(char **author)
> 
> So, the caller is now responsible for freeing the string placed in
> 'author'. Okay.
> 
>>   {
>> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
>> -               return NULL;
>> +       if (strbuf_read_file(&buf, rebase_path_author_script(), 256) <= 0)
>> +               return -1;
> 
> I think you need to strbuf_release(&buf) in this error path since
> strbuf_read_file() doesn't guarantee that the strbuf hasn't been
> partially populated when it returns an error. (That is, this is
> leaking.)

Good point, I'll fix it

>>          /* dequote values and construct ident line in-place */
> 
> Ugh, this comment should have been adjusted in my series. A minor
> matter, though, which can be tweaked later.
> 
>>          /* validate date since fmt_ident() will die() on bad value */
>>          if (parse_date(val[2], &out)){
>> -               warning(_("invalid date format '%s' in '%s'"),
>> +               error(_("invalid date format '%s' in '%s'"),
>>                          val[2], rebase_path_author_script());
>>                  strbuf_release(&out);
>> -               return NULL;
>> +               strbuf_release(&buf);
>> +               return -1;
> 
> You were careful to print the error, which references a value from
> 'buf', before destroying 'buf'. Good.
> 
> (A simplifying alternative would have been to not print the actual
> value and instead say generally that "the date" was bad. Not a big
> deal.)
> 
>>          }
>> -       strbuf_swap(buf, &out);
>> -       strbuf_release(&out);
>> -       return buf->buf;
>> +       *author = strbuf_detach(&out, NULL);
> 
> And, 'author' is only assigned when 0 is returned, so the caller only
> has to free(author) upon success. Fine.
> 
>> +       strbuf_release(&buf);
>> +       return 0;
>>   }
>>
>>   static const char staged_changes_advice[] =
>> @@ -794,12 +795,14 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
>> -               struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
>> -               const char *author = is_rebase_i(opts) ?
>> -                       read_author_ident(&script) : NULL;
>> +               struct strbuf msg = STRBUF_INIT;
>> +               char *author = NULL;
>>                  struct object_id root_commit, *cache_tree_oid;
>>                  int res = 0;
>>
>> +               if (is_rebase_i(opts) && read_author_ident(&author))
>> +                       return -1;
> 
> Logic looks correct, and it's nice to see that you went with 'return
> -1' rather than die(), especially since the caller of run_git_commit()
> is already able to handle -1.

Yes, it reschedules the pick so the user has a chance to fix the 
author-script and then run 'git rebase --continue'

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-07-31 21:39     ` Eric Sunshine
@ 2018-08-01 10:24       ` Phillip Wood
  2018-08-01 15:22         ` Junio C Hamano
  2018-08-01 15:50       ` Phillip Wood
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-01 10:24 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

Hi Eric

On 31/07/18 22:39, Eric Sunshine wrote:
> On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> Single quotes should be escaped as \' not \\'. Note that this only
>> affects authors that contain a single quote and then only external
>> scripts that read the author script and users whose git is upgraded from
>> the shell version of rebase -i while rebase was stopped. This is because
>> the parsing in read_env_script() expected the broken version and for
>> some reason sq_dequote() called by read_author_ident() seems to handle
>> the broken quoting correctly.
> 
> Is the:
> 
>      ...for some reason sq_dequote() called by read_author_ident()
>      seems to handle the broken quoting correctly.
> 
> bit outdated? We know now from patch 2/4 of my series[1] that
> read_author_ident() wasn't handling it correctly at all. It was merely
> ignoring the return value from sq_dequote() and using whatever broken
> value came back from it.

Yes you're right, when I tested it before I must of had GIT_AUTHOR_NAME 
set to the name with the "'" in it when I ran the rebase because it 
appeared to work, but actually sj_dequote() was returning NULL and so 
commit_tree() just picked up the default author. I've just changed the 
test you added to

test_expect_success 'valid author header after --root swap' '
	rebase_setup_and_clean author-header no-conflict-branch &&
	set_fake_editor &&
	git commit --amend --author="Au ${SQ}thor <author@example.com>" 
--no-edit &&
	git cat-file commit HEAD | grep ^author >expected &&
	FAKE_LINES="5 1" git rebase -i --root &&
	git cat-file commit HEAD^ | grep ^author >actual &&
	test_cmp expected actual
'

and it fails without the fixes to write_author_script().

> 
> [1]: https://public-inbox.org/git/20180731073331.40007-3-sunshine@sunshineco.com/
> 
>> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>> diff --git a/sequencer.c b/sequencer.c
>> @@ -664,14 +664,25 @@ static int write_author_script(const char *message)
>>   static int read_env_script(struct argv_array *env)
>>   {
>>          if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>>                  return -1;
> 
> This is not a problem introduced by this patch, but since
> strbuf_read_file() doesn't guarantee that memory hasn't been allocated
> when it returns an error, this is leaking.

I can fix that

>> +       /*
>> +        * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
>> +        * line with a "'" and also escaped "'" incorrectly as "'\\\\''" rather
>> +        * than "'\\''". We check for the terminating "'" on the last line to
>> +        * see how "'" has been escaped in case git was upgraded while rebase
>> +        * was stopped.
>> +        */
>> +       sq_bug = script.len && script.buf[script.len - 2] != '\'';
> 
> I think you need to be checking 'script.len > 1', not just
> 'script.len', otherwise you might access memory outside the allocated
> buffer.

Good catch, Johannes's original was checking script.buf[script.len - 1] 
which I corrected but forget to adjust the previous check.

> This is a very "delicate" check, assuming that a hand-edited file
> won't end with, say, an extra newline. I wonder if this level of
> backward-compatibility is overkill for such an unlikely case.

Yes, it is a bit fragile. Originally the patch just unquoted the correct 
and incorrect quoting but Johannes was worried that might lead to errors 
and suggested this check. The check is aimed at people whose git gets 
upgraded while rebase is stopped for a conflict resolution or edit and 
so have the bad quoting in the author-script from the old version of git 
which started the rebase. Authors with "'" in the name are uncommon but 
not unheard of, I think when I checked there were about half a dozen in 
git's history. I'm not sure what to do for the best.

>>          for (p = script.buf; *p; p++)
>> -               if (skip_prefix(p, "'\\\\''", (const char **)&p2))
>> +               if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
>> +                       strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
>> +               else if (skip_prefix(p, "'\\''", &p2))
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> @@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
>> +test_expect_success 'rebase -i writes correct author-script' '
>> +       test_when_finished "test_might_fail git rebase --abort" &&
>> +       git checkout -b author-with-sq master &&
>> +       GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
>> +       set_fake_editor &&
>> +       FAKE_LINES="edit 1" git rebase -ki HEAD^ &&
> 
> Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?

-k is short for --keep-empty which is needed as the test creates an 
empty commit to check the author (I think that is to avoid changing the 
tree  - Johannes wrote that bit).

Thanks for your comments, I'll do a reroll

Phillip

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-08-01 10:24       ` Phillip Wood
@ 2018-08-01 15:22         ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2018-08-01 15:22 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Eric Sunshine, Phillip Wood, Git List, Johannes Schindelin

Phillip Wood <phillip.wood@talktalk.net> writes:

>> Is the:
>>
>>      ...for some reason sq_dequote() called by read_author_ident()
>>      seems to handle the broken quoting correctly.
>>
>> bit outdated? We know now from patch 2/4 of my series[1] that
>> read_author_ident() wasn't handling it correctly at all. It was merely
>> ignoring the return value from sq_dequote() and using whatever broken
>> value came back from it.
>
> Yes you're right, when I tested it...
>
> Thanks for your comments, I'll do a reroll

Thanks, both.  Sounds like we are quickly converging to the
resolution ;-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-07-31 21:39     ` Eric Sunshine
  2018-08-01 10:24       ` Phillip Wood
@ 2018-08-01 15:50       ` Phillip Wood
  2018-08-01 19:19         ` Eric Sunshine
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-01 15:50 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

On 31/07/18 22:39, Eric Sunshine wrote:
> On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> Single quotes should be escaped as \' not \\'. Note that this only
>> affects authors that contain a single quote and then only external
>> scripts that read the author script and users whose git is upgraded from
>> the shell version of rebase -i while rebase was stopped. This is because
>> the parsing in read_env_script() expected the broken version and for
>> some reason sq_dequote() called by read_author_ident() seems to handle
>> the broken quoting correctly.
> 
> Is the:
> 
>      ...for some reason sq_dequote() called by read_author_ident()
>      seems to handle the broken quoting correctly.
> 
> bit outdated? We know now from patch 2/4 of my series[1] that
> read_author_ident() wasn't handling it correctly at all. It was merely
> ignoring the return value from sq_dequote() and using whatever broken
> value came back from it.
> 
> [1]: https://public-inbox.org/git/20180731073331.40007-3-sunshine@sunshineco.com/
> 
>> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>> diff --git a/sequencer.c b/sequencer.c
>> @@ -664,14 +664,25 @@ static int write_author_script(const char *message)
>>   static int read_env_script(struct argv_array *env)
>>   {
>>          if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>>                  return -1;
> 
> This is not a problem introduced by this patch, but since
> strbuf_read_file() doesn't guarantee that memory hasn't been allocated
> when it returns an error, this is leaking.
> 
>> +       /*
>> +        * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
>> +        * line with a "'" and also escaped "'" incorrectly as "'\\\\''" rather
>> +        * than "'\\''". We check for the terminating "'" on the last line to
>> +        * see how "'" has been escaped in case git was upgraded while rebase
>> +        * was stopped.
>> +        */
>> +       sq_bug = script.len && script.buf[script.len - 2] != '\'';
> 
> I think you need to be checking 'script.len > 1', not just
> 'script.len', otherwise you might access memory outside the allocated
> buffer.
> 
> This is a very "delicate" check, assuming that a hand-edited file
> won't end with, say, an extra newline. I wonder if this level of
> backward-compatibility is overkill for such an unlikely case.

I think I'll get rid of the check and instead use a version number 
written to .git/rebase-merge/interactive to indicate if we need to fix 
the quoting (if there's no number then it needs fixing). We can 
increment the version number in the future if we ever need to implement 
other fallbacks to handle the case where git got upgraded while rebase 
was stopped. I'll send a patch tomorrow

Best Wishes

Phillip

> 
>>          for (p = script.buf; *p; p++)
>> -               if (skip_prefix(p, "'\\\\''", (const char **)&p2))
>> +               if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
>> +                       strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
>> +               else if (skip_prefix(p, "'\\''", &p2))
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> @@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
>> +test_expect_success 'rebase -i writes correct author-script' '
>> +       test_when_finished "test_might_fail git rebase --abort" &&
>> +       git checkout -b author-with-sq master &&
>> +       GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
>> +       set_fake_editor &&
>> +       FAKE_LINES="edit 1" git rebase -ki HEAD^ &&
> 
> Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script
  2018-08-01 15:50       ` Phillip Wood
@ 2018-08-01 19:19         ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-01 19:19 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Johannes Schindelin, Junio C Hamano

On Wed, Aug 1, 2018 at 11:50 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 31/07/18 22:39, Eric Sunshine wrote:
> > On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> >> +       /*
> >> +        * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
> >> +        * line with a "'" and also escaped "'" incorrectly as "'\\\\''" rather
> >> +        * than "'\\''". We check for the terminating "'" on the last line to
> >> +        * see how "'" has been escaped in case git was upgraded while rebase
> >> +        * was stopped.
> >> +        */
> >> +       sq_bug = script.len && script.buf[script.len - 2] != '\'';
> >
> > This is a very "delicate" check, assuming that a hand-edited file
> > won't end with, say, an extra newline. I wonder if this level of
> > backward-compatibility is overkill for such an unlikely case.
>
> I think I'll get rid of the check and instead use a version number
> written to .git/rebase-merge/interactive to indicate if we need to fix
> the quoting (if there's no number then it needs fixing). We can
> increment the version number in the future if we ever need to implement
> other fallbacks to handle the case where git got upgraded while rebase
> was stopped. I'll send a patch tomorrow

Hmm, that approach is pretty heavyweight and would add a fair bit of
new code and complexity which itself could harbor bugs. When I
commented that the check was "delicate", I was (especially) referring
to the rigid "script[len-2]", not necessarily to the basic idea of the
check. So, you could keep the check (in spirit) but make it more
robust. Like this, for instance:

/* big comment explaining old buggy stuff */
static int broken_quoting(const char *s, size_t n)
{
    const char *t = s + n;
    while (t > s && *--t == '\n')
        /* empty */;
    if (t > s && *--t != '\'')
        return 1;
    return 0;
}

static int read_env_script(...)
{
    ...
    sq_bug = broken_quoting(script.buf, script.len);
    ...
}

I would feel much more comfortable with a simple solution like this
than with one introducing new complexity associated with adding a
version number.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (6 preceding siblings ...)
  2018-08-01  1:30 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Hilco Wijbenga
@ 2018-08-01 23:25 ` brian m. carlson
  2018-08-02  8:09   ` Eric Sunshine
  2018-08-02 11:20 ` [PATCH v3 0/2] Fix author script quoting Phillip Wood
  2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
  9 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2018-08-01 23:25 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: git, Johannes Schindelin, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

On Tue, Jul 31, 2018 at 03:33:27AM -0400, Eric Sunshine wrote:
> This is a re-roll of [1] which fixes sequencer bugs resulting in commit
> object corruption when "rebase -i --root" swaps in a new commit as root.
> Unfortunately, those bugs made it into v2.18.0 and have already
> corrupted at least one repository (a local project of mine). Patches 3/4
> and 4/4 are new.
> 
> v1 fixed these bugs:
> 
> * trailing garbage on the commit's "author" header
> 
> * extra trailing digit on "author" header's timezone (caused by two
>   separate bugs)
> 
> v2 fixes those same bugs, plus:
> 
> * eliminates a bogus "@" prepended to the "author" header timestamp
>   which renders the header corrupt
> 
> * takes care to validate information coming from
>   "rebase-merge/author-script" before incorporating it into the "author"
>   header since that file may be hand-edited, and bogus hand-edited
>   values could corrupt the commit object.

I looked at this series and it seems sane and logical to me.  Thanks for
acting quickly to fix the corruption.

Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-01 23:25 ` brian m. carlson
@ 2018-08-02  8:09   ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-02  8:09 UTC (permalink / raw)
  To: brian m. carlson, Git List, Johannes Schindelin, Junio C Hamano,
	Phillip Wood, Akinori MUSHA

On Wed, Aug 1, 2018 at 7:25 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Tue, Jul 31, 2018 at 03:33:27AM -0400, Eric Sunshine wrote:
> > This is a re-roll of [1] which fixes sequencer bugs resulting in commit
> > object corruption when "rebase -i --root" swaps in a new commit as root.
> > Unfortunately, those bugs made it into v2.18.0 and have already
> > corrupted at least one repository (a local project of mine). Patches 3/4
> > and 4/4 are new.
>
> I looked at this series and it seems sane and logical to me.  Thanks for
> acting quickly to fix the corruption.
>
> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>

Thanks for the review.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 0/2] Fix author script quoting
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (7 preceding siblings ...)
  2018-08-01 23:25 ` brian m. carlson
@ 2018-08-02 11:20 ` Phillip Wood
  2018-08-02 11:20   ` [PATCH v3 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
  2018-08-02 11:20   ` [PATCH v3 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
  9 siblings, 2 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-02 11:20 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Thanks to Eric for his comments on v2. The first patch hasn't changed
much, the second one quite a bit. See the individual patches for a
list of changes

Phillip Wood (2):
  sequencer: handle errors in read_author_ident()
  sequencer: fix quoting in write_author_script

 git-rebase--interactive.sh    |   2 +-
 sequencer.c                   | 137 ++++++++++++++++++++++++++--------
 sequencer.h                   |   1 +
 t/t3404-rebase-interactive.sh |  20 ++++-
 4 files changed, 123 insertions(+), 37 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 1/2] sequencer: handle errors in read_author_ident()
  2018-08-02 11:20 ` [PATCH v3 0/2] Fix author script quoting Phillip Wood
@ 2018-08-02 11:20   ` Phillip Wood
  2018-08-03  7:09     ` Eric Sunshine
  2018-08-02 11:20   ` [PATCH v3 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-02 11:20 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The calling code did not treat NULL as an error. Instead NULL caused
it to fallback to using the default author when creating the new
commit. This changed the date and potentially the author of the
commit which corrupted the author data compared to its expected
value. Fix this by returning and integer and passing in a parameter to
receive the author.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v2:
    
     - Improved commit message
     - Fixed memory leak
     - Translated a couple of error messages

 sequencer.c | 52 ++++++++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 944dea6997..1bf8b0c431 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -701,57 +701,59 @@ static char *get_author(const char *message)
 }
 
 /* Read author-script and return an ident line (author <email> timestamp) */
-static const char *read_author_ident(struct strbuf *buf)
+static int read_author_ident(char **author)
 {
 	const char *keys[] = {
 		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
 	};
+	struct strbuf buf = STRBUF_INIT;
 	struct strbuf out = STRBUF_INIT;
 	char *in, *eol;
 	const char *val[3];
 	int i = 0;
 
-	if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
-		return NULL;
+	if (strbuf_read_file(&buf, rebase_path_author_script(), 256) <= 0) {
+		strbuf_release(&buf);
+		return -1;
+	}
 
-	/* dequote values and construct ident line in-place */
-	for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
+	for (in = buf.buf; i < 3 && in - buf.buf < buf.len; i++) {
 		if (!skip_prefix(in, keys[i], (const char **)&in)) {
-			warning("could not parse '%s' (looking for '%s'",
-				rebase_path_author_script(), keys[i]);
-			return NULL;
+			strbuf_release(&buf);
+			return error(_("could not parse '%s' (looking for '%s')"),
+				       rebase_path_author_script(), keys[i]);
 		}
-
 		eol = strchrnul(in, '\n');
 		*eol = '\0';
 		if (!sq_dequote(in)) {
-			warning(_("bad quoting on %s value in '%s'"),
-				keys[i], rebase_path_author_script());
-			return NULL;
+			strbuf_release(&buf);
+			return error(_("bad quoting on %s value in '%s'"),
+				     keys[i], rebase_path_author_script());
 		}
 		val[i] = in;
 		in = eol + 1;
 	}
 
 	if (i < 3) {
-		warning("could not parse '%s' (looking for '%s')",
-			rebase_path_author_script(), keys[i]);
-		return NULL;
+		strbuf_release(&buf);
+		return error(_("could not parse '%s' (looking for '%s')"),
+			     rebase_path_author_script(), keys[i]);
 	}
 
 	/* validate date since fmt_ident() will die() on bad value */
 	if (parse_date(val[2], &out)){
-		warning(_("invalid date format '%s' in '%s'"),
+		error(_("invalid date format '%s' in '%s'"),
 			val[2], rebase_path_author_script());
 		strbuf_release(&out);
-		return NULL;
+		strbuf_release(&buf);
+		return -1;
 	}
 
 	strbuf_reset(&out);
 	strbuf_addstr(&out, fmt_ident(val[0], val[1], val[2], 0));
-	strbuf_swap(buf, &out);
-	strbuf_release(&out);
-	return buf->buf;
+	*author = strbuf_detach(&out, NULL);
+	strbuf_release(&buf);
+	return 0;
 }
 
 static const char staged_changes_advice[] =
@@ -794,12 +796,14 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 	const char *value;
 
 	if ((flags & CREATE_ROOT_COMMIT) && !(flags & AMEND_MSG)) {
-		struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
-		const char *author = is_rebase_i(opts) ?
-			read_author_ident(&script) : NULL;
+		struct strbuf msg = STRBUF_INIT;
+		char *author = NULL;
 		struct object_id root_commit, *cache_tree_oid;
 		int res = 0;
 
+		if (is_rebase_i(opts) && read_author_ident(&author))
+			return -1;
+
 		if (!defmsg)
 			BUG("root commit without message");
 
@@ -817,7 +821,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 					  opts->gpg_sign);
 
 		strbuf_release(&msg);
-		strbuf_release(&script);
+		free(author);
 		if (!res) {
 			update_ref(NULL, "CHERRY_PICK_HEAD", &root_commit, NULL,
 				   REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-02 11:20 ` [PATCH v3 0/2] Fix author script quoting Phillip Wood
  2018-08-02 11:20   ` [PATCH v3 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
@ 2018-08-02 11:20   ` Phillip Wood
  2018-08-02 17:27     ` Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-02 11:20 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Single quotes should be escaped as \' not \\'. The bad quoting breaks
the interactive version of 'rebase --root' (which is used when there is
no '--onto' even if the user does not specify --interactive) for authors
that contain "'" as sq_dequote() called read_author_ident() errors out
on the bad quoting.

For other interactive rebases this only affects external scripts that
read the author script and users whose git is upgraded from the shell
version of rebase -i while rebase was stopped when the author contains
"'". This is because the parsing in read_env_script() expected the
broken quoting.

This patch includes code to gracefully handle the broken quoting when
git has been upgraded while rebase was stopped. It does this by
recording a version number (currently 1) in
$GIT_DIR/rebase-merge/interactive to indicate that the author-script
was created with correct quoting. Previously this file was always
empty. The fallback path also fixes any missing "'" at the end of the
GIT_AUTHOR_DATE line.

The fallback code has been manually tested by reverting the quoting
fixes in write_author_script() with and without reverting the previous
fix for the missing "'" at the end of the GIT_AUTHOR_DATE line and
running t3404-rebase-interactive.sh.

Ideally rebase and am would share the same code for reading and
writing the author script, but this commit just fixes the immediate
bug.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    Changes since v2:
    
     - Changed the way it detects and handles broken quoting when git
       is upgraded while rebase is stopped. This means that the fix for
       quoting is no longer tied to the fix for the trailing "'" on the
       GIT_AUTHOR_DATE line. It now also fixes broken quoting in
       read_author_ident() which would error out with the last version.
     - Changed the tests for authors that contain "'". If sq_dequote() and
       parse the author script then the shell should be able to and the
       most important thing is to check that the author in the new commit
       is correct.
     - Fixed an existing memory leak
    
    Note Eric expressed concerns about using a version number, but I had
    already written this code so I thought I send it and see how people
    felt about the additionally complexity. I like the fact that it
    decouple the quoting fix from Eric's patches. In the end the extra
    code is just reading an integer from a text file. The test for adding
    the trailing quote to the author script is quite strict can be
    defeated by a badly edited author script but hopefully that is
    unlikely as it is only invoked if git got upgraded while rebase was
    stopped.

 git-rebase--interactive.sh    |  2 +-
 sequencer.c                   | 89 +++++++++++++++++++++++++++++++----
 sequencer.h                   |  1 +
 t/t3404-rebase-interactive.sh | 20 ++++++--
 4 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 06a7b79307..c1e3f947a5 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -880,7 +880,7 @@ init_basic_state () {
 	mkdir -p "$state_dir" || die "$(eval_gettext "Could not create temporary \$state_dir")"
 	rm -f "$(git rev-parse --git-path REBASE_HEAD)"
 
-	: > "$state_dir"/interactive || die "$(gettext "Could not mark as interactive")"
+	echo 1 > "$state_dir"/interactive || die "$(gettext "Could not mark as interactive")"
 	write_basic_state
 }
 
diff --git a/sequencer.c b/sequencer.c
index 1bf8b0c431..2599f9d80e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -44,6 +44,12 @@ static GIT_PATH_FUNC(git_path_head_file, "sequencer/head")
 static GIT_PATH_FUNC(git_path_abort_safety_file, "sequencer/abort-safety")
 
 static GIT_PATH_FUNC(rebase_path, "rebase-merge")
+/*
+ * This file indicates that an interactive rebase is in progress, if it contians
+ * an integer then that is a version indicator for the contents of files in
+ * .git/rebase-merge
+ */
+static GIT_PATH_FUNC(rebase_path_interactive, "rebase-merge/interactive")
 /*
  * The file containing rebase commands, comments, and empty lines.
  * This file is created by "git rebase -i" then edited by the user. As
@@ -636,42 +642,79 @@ static int write_author_script(const char *message)
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_EMAIL='");
 	while (*message && *message != '\n' && *message != '\r')
 		if (skip_prefix(message, "> ", &message))
 			break;
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_DATE='@");
 	while (*message && *message != '\n' && *message != '\r')
 		if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addch(&buf, '\'');
 	res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
 	strbuf_release(&buf);
 	return res;
 }
 
+/*
+ * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE line with
+ * a "'" and also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". Fix
+ * these problems before dequoting in when git was upgraded while rebase was
+ * stopped.
+ */
+static int fix_bad_author_script(struct strbuf *script)
+{
+	const char *next;
+	size_t off = 0;
+
+	while ((next = strstr(script->buf + off, "'\\\\''"))) {
+		off = next - script->buf + 4;
+		strbuf_splice(script, next - script->buf, 5,"'\\''", 4);
+	}
+
+	if ((next = strstr(script->buf, "\nGIT_AUTHOR_DATE='")) &&
+	    (next = strchr(++next, '\n')) &&
+	    ++next - script->buf == script->len) {
+		if (script->buf[script->len - 2] != '\'')
+			strbuf_insert(script, script->len - 1, "'", 1);
+	} else {
+		return error(_("unable to parse '%s'"),
+			     rebase_path_author_script());
+	}
+
+	return 0;
+}
+
 /*
  * Read a list of environment variable assignments (such as the author-script
  * file) into an environment block. Returns -1 on error, 0 otherwise.
  */
-static int read_env_script(struct argv_array *env)
+static int read_env_script(struct replay_opts *opts, struct argv_array *env)
 {
 	struct strbuf script = STRBUF_INIT;
 	int i, count = 0;
-	char *p, *p2;
+	const char *p2;
+	char *p;
 
-	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
+	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0) {
+		strbuf_release(&script);
 		return -1;
+	}
+
+	if (!opts->version && fix_bad_author_script(&script)) {
+		strbuf_release(&script);
+		return -1;
+	}
 
 	for (p = script.buf; *p; p++)
-		if (skip_prefix(p, "'\\\\''", (const char **)&p2))
+		if (skip_prefix(p, "'\\''", &p2))
 			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
 		else if (*p == '\'')
 			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
@@ -701,7 +744,7 @@ static char *get_author(const char *message)
 }
 
 /* Read author-script and return an ident line (author <email> timestamp) */
-static int read_author_ident(char **author)
+static int read_author_ident(struct replay_opts *opts, char **author)
 {
 	const char *keys[] = {
 		"GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
@@ -717,6 +760,11 @@ static int read_author_ident(char **author)
 		return -1;
 	}
 
+	if (!opts->version && fix_bad_author_script(&buf)) {
+		strbuf_release(&buf);
+		return -1;
+	}
+
 	for (in = buf.buf; i < 3 && in - buf.buf < buf.len; i++) {
 		if (!skip_prefix(in, keys[i], (const char **)&in)) {
 			strbuf_release(&buf);
@@ -801,7 +849,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 		struct object_id root_commit, *cache_tree_oid;
 		int res = 0;
 
-		if (is_rebase_i(opts) && read_author_ident(&author))
+		if (is_rebase_i(opts) && read_author_ident(opts, &author))
 			return -1;
 
 		if (!defmsg)
@@ -839,7 +887,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 			cmd.err = -1;
 		}
 
-		if (read_env_script(&cmd.env_array)) {
+		if (read_env_script(opts, &cmd.env_array)) {
 			const char *gpg_opt = gpg_sign_opt_quoted(opts);
 
 			return error(_(staged_changes_advice),
@@ -2238,6 +2286,27 @@ static int read_populate_opts(struct replay_opts *opts)
 	if (is_rebase_i(opts)) {
 		struct strbuf buf = STRBUF_INIT;
 
+		if (read_oneliner(&buf, rebase_path_interactive(), 0)) {
+			if (buf.len) {
+				char *end;
+				long version = strtol(buf.buf, &end, 10);
+				if (version < 1 ||version > INT_MAX ||
+				    *end != '\0') {
+					strbuf_release(&buf);
+					return error(_("unable to parse '%s'"),
+						     rebase_path_interactive());
+				}
+				opts->version = (int)version;
+			} else {
+				opts->version = 0;
+			}
+			strbuf_reset(&buf);
+		} else {
+			strbuf_release(&buf);
+			return error(_("unable to read '%s'"),
+				     rebase_path_interactive());
+		}
+
 		if (read_oneliner(&buf, rebase_path_gpg_sign_opt(), 1)) {
 			if (!starts_with(buf.buf, "-S"))
 				strbuf_reset(&buf);
diff --git a/sequencer.h b/sequencer.h
index c5787c6b56..24d69e7ff7 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -34,6 +34,7 @@ struct replay_opts {
 	int keep_redundant_commits;
 	int verbose;
 
+	int version;
 	int mainline;
 
 	char *gpg_sign;
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index b72167ecd5..4e2e787f26 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -75,6 +75,7 @@ test_expect_success 'rebase --keep-empty' '
 	test_line_count = 6 actual
 '
 
+SQ="'"
 test_expect_success 'rebase -i with the exec command' '
 	git checkout master &&
 	(
@@ -1361,7 +1362,6 @@ test_expect_success 'editor saves as CR/LF' '
 	)
 '
 
-SQ="'"
 test_expect_success 'rebase -i --gpg-sign=<key-id>' '
 	test_when_finished "test_might_fail git rebase --abort" &&
 	set_fake_editor &&
@@ -1382,9 +1382,21 @@ test_expect_success 'rebase -i --gpg-sign=<key-id> overrides commit.gpgSign' '
 test_expect_success 'valid author header after --root swap' '
 	rebase_setup_and_clean author-header no-conflict-branch &&
 	set_fake_editor &&
-	FAKE_LINES="2 1" git rebase -i --root &&
-	git cat-file commit HEAD^ >out &&
-	grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
+	git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
+	git cat-file commit HEAD | grep ^author >expected &&
+	FAKE_LINES="5 1" git rebase -i --root &&
+	git cat-file commit HEAD^ | grep ^author >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'valid author header when author contains single quote' '
+	rebase_setup_and_clean author-header no-conflict-branch &&
+	set_fake_editor &&
+	git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
+	git cat-file commit HEAD | grep ^author >expected &&
+	FAKE_LINES="1 5" git rebase -i --root &&
+	git cat-file commit HEAD | grep ^author >actual &&
+	test_cmp expected actual
 '
 
 test_done
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-02 11:20   ` [PATCH v3 2/2] sequencer: fix quoting in write_author_script Phillip Wood
@ 2018-08-02 17:27     ` Junio C Hamano
  2018-08-03  7:59       ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2018-08-02 17:27 UTC (permalink / raw)
  To: Phillip Wood, Alban Gruin
  Cc: Git Mailing List, Eric Sunshine, Johannes Schindelin,
	Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Single quotes should be escaped as \' not \\'. The bad quoting breaks
> the interactive version of 'rebase --root' (which is used when there is
> no '--onto' even if the user does not specify --interactive) for authors
> that contain "'" as sq_dequote() called read_author_ident() errors out
> on the bad quoting.
>
> For other interactive rebases this only affects external scripts that
> read the author script and users whose git is upgraded from the shell
> version of rebase -i while rebase was stopped when the author contains
> "'". This is because the parsing in read_env_script() expected the
> broken quoting.

I wasn't following the discussion, but is it the general consensus
that reading the broken a-i file is a requirement for the new code?
Not an objection phrased as a question.

I do not think it is worth worrying about the "upgrade while rebase
was in progress" case, if it involves much more code than necessary
without its support, especially if the only thing the user needs to
do recover from such a situation is to say "rebase --abort" and then
to retry the same rebase with the fixed version that was installed
in the meantime.  Let's see how much we need to bend over backwards
to do this "transition" thing.

> Ideally rebase and am would share the same code for reading and
> writing the author script, but this commit just fixes the immediate
> bug.

OK.

> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 06a7b79307..c1e3f947a5 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -880,7 +880,7 @@ init_basic_state () {
>  	mkdir -p "$state_dir" || die "$(eval_gettext "Could not create temporary \$state_dir")"
>  	rm -f "$(git rev-parse --git-path REBASE_HEAD)"
>  
> -	: > "$state_dir"/interactive || die "$(gettext "Could not mark as interactive")"
> +	echo 1 > "$state_dir"/interactive || die "$(gettext "Could not mark as interactive")"

This impacts the work Alban is doing, which at the end removes this
script altogether.

> +/*
> + * write_author_script() used to fail to terminate the GIT_AUTHOR_DATE line with
> + * a "'" and also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". Fix

I think the comment here (for both the wrong and the right versions)
is easier to read if you wrote the string as literal without C, i.e.
The string is "'\\''" but as a string literal in C it is expressed
as "'\\\\''".

> +static int fix_bad_author_script(struct strbuf *script)
> +{
> +	const char *next;
> +	size_t off = 0;
> +
> +	while ((next = strstr(script->buf + off, "'\\\\''"))) {

This looks brittle.

We need assurance that the first "'\\''" we see on the line came
from the attempt by the broken writer to write out a single "'", and
not from anything else.  The broken writer places its own "'"
immediately after GIT_AUTHOR_NAME= (just like the corrected one
does) before moving on to the end-user payload.  Can the single
quote at the beginning of the substring you are looking for be that
one?  If the end user's payload began with two backslashes, that
would have produced a result that matches the first three bytes of
the substring you are looking for.  But there is no way for the
end-user payload to make the next two bytes "''"---any byte other
than a sq would result in a sq added to the result, and a byte that
is a sq would give one sq followed by a bs.

OK, so this is probably doing the right thing, as long as we know we
are reading from the old and broken writer.  It still does look
unnecessarily ugly and over-engineered to have this (and the
"version" reading code), though, at least to me, but perhaps it is
just me.

Thanks.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/2] sequencer: handle errors in read_author_ident()
  2018-08-02 11:20   ` [PATCH v3 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
@ 2018-08-03  7:09     ` Eric Sunshine
  2018-08-03 15:53       ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-03  7:09 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Thu, Aug 2, 2018 at 7:20 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> The calling code did not treat NULL as an error. Instead NULL caused
> it to fallback to using the default author when creating the new
> commit. This changed the date and potentially the author of the
> commit which corrupted the author data compared to its expected
> value. Fix this by returning and integer and passing in a parameter to
> receive the author.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -701,57 +701,59 @@ static char *get_author(const char *message)
> -static const char *read_author_ident(struct strbuf *buf)
> +static int read_author_ident(char **author)
>  {
> +       struct strbuf buf = STRBUF_INIT;
> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> -               return NULL;
> +       if (strbuf_read_file(&buf, rebase_path_author_script(), 256) <= 0) {
> +               strbuf_release(&buf);
> +               return -1;
> +       }

[...much noisiness snipped...]

> @@ -794,12 +796,14 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
>         if ((flags & CREATE_ROOT_COMMIT) && !(flags & AMEND_MSG)) {
> -               struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
> -               const char *author = is_rebase_i(opts) ?
> -                       read_author_ident(&script) : NULL;
> +               struct strbuf msg = STRBUF_INIT;
> +               char *author = NULL;
>
> +               if (is_rebase_i(opts) && read_author_ident(&author))
> +                       return -1;
> +

I think this patch can be simplified considerably by shifting one's
perspective. If we admit that read_author_ident() is already correctly
reporting an error by returning NULL (which is exactly what it is
doing), then the bug is is purely on the calling side; namely, the
caller is ignoring the error. (In fact, your commit message already
states this.)

So, if it's the caller which is buggy, then read_author_ident() does
not require _any_ changes (meaning all the noisiness in this patch can
be dropped), and only the caller needs a fix, and that change can be
quite tiny. To wit, instead of initializing 'author' like this (as
done in the current "buggy" code):

    const char *author = is_rebase_i(opts) ?
        read_author_ident(&script) : NULL;

Do this instead:

    const char *author = NULL;
    ...
    if (is_rebase_i(opts)) {
        author = read_author_ident(&script);
        if (!author) {
            strbuf_release(&script);
            return -1;
        }
    }

That's it, a minimal fix giving the same result, without all the code
churn, thus safer (less opportunity, for instance, to introduce a leak
as in v1). Of course, you can collapse the above even further to:

    if (is_rebase_i(opts) &&
        !(author = read_author_ident(&script)) {
            strbuf_release(&script);
            return -1;
    }

Though, I think that is less readable.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-02 17:27     ` Junio C Hamano
@ 2018-08-03  7:59       ` Eric Sunshine
  2018-08-03  9:33         ` Phillip Wood
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-03  7:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, Alban Gruin, Git List, Johannes Schindelin,
	Phillip Wood

On Thu, Aug 2, 2018 at 1:27 PM Junio C Hamano <gitster@pobox.com> wrote:
> Phillip Wood <phillip.wood@talktalk.net> writes:
> > For other interactive rebases this only affects external scripts that
> > read the author script and users whose git is upgraded from the shell
> > version of rebase -i while rebase was stopped when the author contains
> > "'". This is because the parsing in read_env_script() expected the
> > broken quoting.
>
> I wasn't following the discussion, but is it the general consensus
> that reading the broken a-i file is a requirement for the new code?
> Not an objection phrased as a question.
>
> I do not think it is worth worrying about the "upgrade while rebase
> was in progress" case, if it involves much more code than necessary
> without its support, especially if the only thing the user needs to
> do recover from such a situation is to say "rebase --abort" and then
> to retry the same rebase with the fixed version that was installed
> in the meantime. [...]
>
> [...] It still does look
> unnecessarily ugly and over-engineered to have this (and the
> "version" reading code), though, at least to me, but perhaps it is
> just me.

It's not just you. I also questioned[1] if such backward compatibility
was needed, and had concerns[2] about a version file being heavyweight
and over-engineered.

This is a lot of new code (possibly harboring its own bugs) for a
situation unlikely to arise, and which becomes ever more unlikely as
time passes. Also, unlike long-lived (years or decades) resources,
such as a repository or pack file, for instance, for which a version
number makes sense, this file is very short-lived, which makes it even
more difficult to justify adding this much machinery for something so
unlikely to arise in practice.

The overall aim of this series to fix these bugs is laudable, but I
would be happy to see this one reduced to just a "bug fix" patch
without all the backward-compatibility machinery (and wouldn't mind
seeing patch 1/2 simplified[3], as well).

Thanks.

[1]: https://public-inbox.org/git/CAPig+cR5VHP8muo5_A_9t7OPZam8O_uPb0nd73B15Ye92n+p7Q@mail.gmail.com/
[2]: https://public-inbox.org/git/CAPig+cTttbV2FjnoS_SZtwh2J4wwzsbK+48BAbt1cV0utynYzw@mail.gmail.com/
[3]: https://public-inbox.org/git/CAPig+cSZ3Zm=qFcvGjyj_uStn=JMAYuskMa0O_2yxkKjaRWTSg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-03  7:59       ` Eric Sunshine
@ 2018-08-03  9:33         ` Phillip Wood
  2018-08-03 10:02           ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-03  9:33 UTC (permalink / raw)
  To: Eric Sunshine, Junio C Hamano
  Cc: Alban Gruin, Git List, Johannes Schindelin, Phillip Wood

Dear Eric and Junio
On 03/08/18 08:59, Eric Sunshine wrote:
> On Thu, Aug 2, 2018 at 1:27 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>> For other interactive rebases this only affects external scripts that
>>> read the author script and users whose git is upgraded from the shell
>>> version of rebase -i while rebase was stopped when the author contains
>>> "'". This is because the parsing in read_env_script() expected the
>>> broken quoting.
>>
>> I wasn't following the discussion, but is it the general consensus
>> that reading the broken a-i file is a requirement for the new code?
>> Not an objection phrased as a question.
>>
>> I do not think it is worth worrying about the "upgrade while rebase
>> was in progress" case, if it involves much more code than necessary
>> without its support, especially if the only thing the user needs to
>> do recover from such a situation is to say "rebase --abort" and then
>> to retry the same rebase with the fixed version that was installed
>> in the meantime. [...]
>>
>> [...] It still does look
>> unnecessarily ugly and over-engineered to have this (and the
>> "version" reading code), though, at least to me, but perhaps it is
>> just me.
> 
> It's not just you. I also questioned[1] if such backward compatibility
> was needed, and had concerns[2] about a version file being heavyweight
> and over-engineered.

If there isn't some backward compatibility then if git gets upgraded
while rebase is stopped then the author data will be silently corrupted
if it contains "'". read_author_ident() will error out but that is only
used for the root commit. read_env_script() which is used for normal
picks will not dequote the badly quoted value correctly and will not
return an error. It is unlikely but possible, I'll leave it to Junio to
decide if it is worth it

> 
> This is a lot of new code (possibly harboring its own bugs) for a
> situation unlikely to arise, and which becomes ever more unlikely as
> time passes. Also, unlike long-lived (years or decades) resources,
> such as a repository or pack file, for instance, for which a version
> number makes sense, this file is very short-lived, which makes it even
> more difficult to justify adding this much machinery for something so
> unlikely to arise in practice.

There is a precedent for adding backwards compatibility 84df4560ed
("rebase: extract code for writing basic state", 2011-02-06) though it
is much simpler. Part of the commit message reads

    Note that non-interactive rebase stores the sha1 of the
    original head in a file called orig-head, while interactive
    rebase stores it in a file called head. Change this by
    writing to orig-head in both cases. When reading, try to read
    from orig-head. If that fails, read from head instead. This
    protects users who upgraded git while they had an ongoing
    interactive rebase, while still making it possible to remove
    the code that reads from head at some point in the future.

Best Wishes

Phillip

> The overall aim of this series to fix these bugs is laudable, but I
> would be happy to see this one reduced to just a "bug fix" patch
> without all the backward-compatibility machinery (and wouldn't mind
> seeing patch 1/2 simplified[3], as well).
> 
> Thanks.
> 
> [1]: https://public-inbox.org/git/CAPig+cR5VHP8muo5_A_9t7OPZam8O_uPb0nd73B15Ye92n+p7Q@mail.gmail.com/
> [2]: https://public-inbox.org/git/CAPig+cTttbV2FjnoS_SZtwh2J4wwzsbK+48BAbt1cV0utynYzw@mail.gmail.com/
> [3]: https://public-inbox.org/git/CAPig+cSZ3Zm=qFcvGjyj_uStn=JMAYuskMa0O_2yxkKjaRWTSg@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-03  9:33         ` Phillip Wood
@ 2018-08-03 10:02           ` Eric Sunshine
  2018-08-03 14:12             ` Phillip Wood
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-03 10:02 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Junio C Hamano, Alban Gruin, Git List, Johannes Schindelin

On Fri, Aug 3, 2018 at 5:33 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> If there isn't some backward compatibility then if git gets upgraded
> while rebase is stopped then the author data will be silently corrupted
> if it contains "'". read_author_ident() will error out but that is only
> used for the root commit. read_env_script() which is used for normal
> picks will not dequote the badly quoted value correctly and will not
> return an error. It is unlikely but possible, I'll leave it to Junio to
> decide if it is worth it

If I understand correctly, the approach you implemented earlier[1]
(perhaps coupled with the more robust detection suggested here[2])
would be sufficient to handle this backward compatibility concern.
While it may not be as pretty or generalized as the current patch, it
involves far less machinery, thus is less likely to harbor its own
bugs. The earlier version is also much more self-contained, which
makes it easier to drop at some point when backward compatibility is
no longer a concern (if ever).

> There is a precedent for adding backwards compatibility 84df4560ed
> ("rebase: extract code for writing basic state", 2011-02-06) though it
> is much simpler.

Indeed, it is much simpler, adding a one-liner 'else' case to an
'if-then' for backward compatibility. Your earlier implementation[1]
was pretty much the equivalent, just adding an extra one-liner arm to
an 'if-then' statement.

The bug fix itself is important, and, while I do favor the cleaner
approach of not worrying about backward compatibility for this fairly
unlikely case, your earlier version seems a better compromise between
having no backward compatibility and the much more heavyweight version
implemented here.

Anyhow, I'm fine with whatever Junio decides.

[1]: https://public-inbox.org/git/20180731111532.9358-3-phillip.wood@talktalk.net/
[2]: https://public-inbox.org/git/CAPig+cTttbV2FjnoS_SZtwh2J4wwzsbK+48BAbt1cV0utynYzw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-03 10:02           ` Eric Sunshine
@ 2018-08-03 14:12             ` Phillip Wood
  2018-08-07 17:20               ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-03 14:12 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood
  Cc: Junio C Hamano, Alban Gruin, Git List, Johannes Schindelin

Hi Eric
On 03/08/18 11:02, Eric Sunshine wrote:
> On Fri, Aug 3, 2018 at 5:33 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> If there isn't some backward compatibility then if git gets upgraded
>> while rebase is stopped then the author data will be silently corrupted
>> if it contains "'". read_author_ident() will error out but that is only
>> used for the root commit. read_env_script() which is used for normal
>> picks will not dequote the badly quoted value correctly and will not
>> return an error. It is unlikely but possible, I'll leave it to Junio to
>> decide if it is worth it
> 
> If I understand correctly, the approach you implemented earlier[1]
> (perhaps coupled with the more robust detection suggested here[2])
> would be sufficient to handle this backward compatibility concern.
> While it may not be as pretty or generalized as the current patch, it
> involves far less machinery, thus is less likely to harbor its own
> bugs. The earlier version is also much more self-contained, which
> makes it easier to drop at some point when backward compatibility is
> no longer a concern (if ever).

Yes I think the earlier approach with the more robust detection you 
suggested is probably a good compromise. Junio does that sound good to you?

Best Wishes

Phillip

>> There is a precedent for adding backwards compatibility 84df4560ed
>> ("rebase: extract code for writing basic state", 2011-02-06) though it
>> is much simpler.
> 
> Indeed, it is much simpler, adding a one-liner 'else' case to an
> 'if-then' for backward compatibility. Your earlier implementation[1]
> was pretty much the equivalent, just adding an extra one-liner arm to
> an 'if-then' statement.
> 
> The bug fix itself is important, and, while I do favor the cleaner
> approach of not worrying about backward compatibility for this fairly
> unlikely case, your earlier version seems a better compromise between
> having no backward compatibility and the much more heavyweight version
> implemented here.
> 
> Anyhow, I'm fine with whatever Junio decides.
> 
> [1]: https://public-inbox.org/git/20180731111532.9358-3-phillip.wood@talktalk.net/
> [2]: https://public-inbox.org/git/CAPig+cTttbV2FjnoS_SZtwh2J4wwzsbK+48BAbt1cV0utynYzw@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/2] sequencer: handle errors in read_author_ident()
  2018-08-03  7:09     ` Eric Sunshine
@ 2018-08-03 15:53       ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2018-08-03 15:53 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Phillip Wood, Git List, Johannes Schindelin

Eric Sunshine <sunshine@sunshineco.com> writes:

> I think this patch can be simplified considerably by shifting one's
> perspective. If we admit that read_author_ident() is already correctly
> reporting an error by returning NULL (which is exactly what it is
> doing), then the bug is is purely on the calling side; namely, the
> caller is ignoring the error. (In fact, your commit message already
> states this.)

This approach looks quite sensible.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-01  6:22   ` Eric Sunshine
@ 2018-08-07  1:19     ` Hilco Wijbenga
  2018-08-07  3:31       ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Hilco Wijbenga @ 2018-08-07  1:19 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Johannes Schindelin, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

On Tue, Jul 31, 2018 at 11:22 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, Jul 31, 2018 at 9:30 PM Hilco Wijbenga <hilco.wijbenga@gmail.com> wrote:
>> On Tue, Jul 31, 2018 at 12:33 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
>> > This is a re-roll of [1] which fixes sequencer bugs resulting in commit
>> > object corruption when "rebase -i --root" swaps in a new commit as root.
>> > Unfortunately, those bugs made it into v2.18.0 and have already
>> > corrupted at least one repository (a local project of mine). Patches 3/4
>> > and 4/4 are new.
>>
>> Does this also fix losing the initial commit if it is empty?
>>
>> git init ; git commit -m 'Initial commit' --allow-empty ; touch
>> file.txt ; git add file.txt ; git commit -m 'Add file.txt' ; git
>> rebase --root
>>
>> I would expect there to be 2 commits but the first one has
>> disappeared. (This usually happens with "git rebase -i --root" early
>> on in a new project.)
>
> This series should have no impact on that issue; it is fixing root
> commit object corruption, and does not touch or change how the commit
> graph is maintained or how the rebasing machinery itself works (and
> the --allow-empty stuff is part of that machinery).
>
> As Dscho is cc:'d already, perhaps he can chime in as a resident expert.
>
> By the way, what happens if you use --keep-empty while rebasing?

I was not aware of that flag. But, although I was expecting it to, if
I use it with the rebase, I don't see any different behaviour. I can't
really make out what its purpose is from "Keep the commits that do not
change anything from its parents in the result.".

But your suggestion did make me think about what behaviour I would
like to see, exactly. I like that Git removes commits that no longer
serve any purpose (because I've included their changes in earlier
commits). So I would not want to keep commits that become empty during
the rebase. What I would like to see is that commits that _start out_
as empty, are retained. Would such behaviour make sense? Or would that
be considered surprising behaviour?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-07  1:19     ` Hilco Wijbenga
@ 2018-08-07  3:31       ` Eric Sunshine
  2018-08-07 21:09         ` Junio C Hamano
  2018-08-27 22:34         ` Johannes Schindelin
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-07  3:31 UTC (permalink / raw)
  To: Hilco Wijbenga
  Cc: Git List, Johannes Schindelin, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga <hilco.wijbenga@gmail.com> wrote:
> But your suggestion did make me think about what behaviour I would
> like to see, exactly. I like that Git removes commits that no longer
> serve any purpose (because I've included their changes in earlier
> commits). So I would not want to keep commits that become empty during
> the rebase. What I would like to see is that commits that _start out_
> as empty, are retained. Would such behaviour make sense? Or would that
> be considered surprising behaviour?

I, personally, have no opinion since I don't use empty commits.
Perhaps someone more experienced and more long-sighted will chime in.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 0/2] fix author-script quoting
  2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
                   ` (8 preceding siblings ...)
  2018-08-02 11:20 ` [PATCH v3 0/2] Fix author script quoting Phillip Wood
@ 2018-08-07  9:34 ` Phillip Wood
  2018-08-07  9:34   ` [PATCH v4 1/2] sequencer: handle errors from read_author_ident() Phillip Wood
                     ` (2 more replies)
  9 siblings, 3 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-07  9:34 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

I've updated these based on Eric's suggestions, hopefully they're good
to go now. Thanks Eric for you help.

Phillip Wood (2):
  sequencer: handle errors from read_author_ident()
  sequencer: fix quoting in write_author_script

 sequencer.c                   | 47 ++++++++++++++++++++++++++++-------
 t/t3404-rebase-interactive.sh | 18 +++++++++++---
 2 files changed, 53 insertions(+), 12 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 1/2] sequencer: handle errors from read_author_ident()
  2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
@ 2018-08-07  9:34   ` Phillip Wood
  2018-08-08  9:43     ` Eric Sunshine
  2018-08-07  9:34   ` [PATCH v4 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  2018-08-08  9:51   ` [PATCH v4 0/2] fix author-script quoting Eric Sunshine
  2 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-07  9:34 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Check for a NULL return value from read_author_ident() that indicates
an error. Previously the NULL author was passed to commit_tree() which
would then fallback to using the default author when creating the new
commit. This changed the date and potentially the author of the commit
which corrupted the author data compared to its expected value.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v3:
    
     - Implemented the simpler scheme suggested by Eric

 sequencer.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 944dea6997..c4e4418559 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -795,11 +795,18 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts,
 
 	if ((flags & CREATE_ROOT_COMMIT) && !(flags & AMEND_MSG)) {
 		struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
-		const char *author = is_rebase_i(opts) ?
-			read_author_ident(&script) : NULL;
+		const char *author = NULL;
 		struct object_id root_commit, *cache_tree_oid;
 		int res = 0;
 
+		if (is_rebase_i(opts)) {
+			author = read_author_ident(&script);
+			if (!author) {
+				strbuf_release(&script);
+				return -1;
+			}
+		}
+
 		if (!defmsg)
 			BUG("root commit without message");
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
  2018-08-07  9:34   ` [PATCH v4 1/2] sequencer: handle errors from read_author_ident() Phillip Wood
@ 2018-08-07  9:34   ` Phillip Wood
  2018-08-07 10:23     ` Eric Sunshine
  2018-08-08  9:39     ` Eric Sunshine
  2018-08-08  9:51   ` [PATCH v4 0/2] fix author-script quoting Eric Sunshine
  2 siblings, 2 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-07  9:34 UTC (permalink / raw)
  To: Git Mailing List, Eric Sunshine
  Cc: Junio C Hamano, Johannes Schindelin, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Single quotes should be escaped as \' not \\'. The bad quoting breaks
the interactive version of 'rebase --root' (which is used when there
is no '--onto' even if the user does not specify --interactive) for
authors that contain "'" as sq_dequote() called by read_author_ident()
errors out on the bad quoting.

For other interactive rebases this only affects external scripts that
read the author script and users whose git is upgraded from the shell
version of rebase -i while rebase was stopped when the author contains
"'". This is because the parsing in read_env_script() expected the
broken quoting.

This patch includes code to handle the broken quoting when
git has been upgraded while rebase was stopped. It does this by
detecting the missing "'" at the end of the GIT_AUTHOR_DATE line to see
if it should dequote \\' as "'". Note this is only implemented for
normal picks, not for creating a new root commit (rebase will stop with
an error complaining out bad quoting in that case).

The fallback code has been manually tested by reverting both the quoting
fixes in write_author_script() and the previous fix for the missing "'"
at the end of the GIT_AUTHOR_DATE line and running
t3404-rebase-interactive.sh.

Ideally rebase and am would share the same code for reading and
writing the author script, but this commit just fixes the immediate
bug.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    changes since v3:
    
     - Reverted the implementation to v2 with more robust detection of the
       missing "'" on the last line of the author script based on a
       suggestion by Eric. This means that this series needs to progress
       closely with Eric's series of fixes or the fallback code will never
       be called.
    
    - Tweaked the last test

 sequencer.c                   | 36 ++++++++++++++++++++++++++++-------
 t/t3404-rebase-interactive.sh | 18 +++++++++++++++---
 2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index c4e4418559..ba11fe5bca 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -636,42 +636,64 @@ static int write_author_script(const char *message)
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_EMAIL='");
 	while (*message && *message != '\n' && *message != '\r')
 		if (skip_prefix(message, "> ", &message))
 			break;
 		else if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addstr(&buf, "'\nGIT_AUTHOR_DATE='@");
 	while (*message && *message != '\n' && *message != '\r')
 		if (*message != '\'')
 			strbuf_addch(&buf, *(message++));
 		else
-			strbuf_addf(&buf, "'\\\\%c'", *(message++));
+			strbuf_addf(&buf, "'\\%c'", *(message++));
 	strbuf_addch(&buf, '\'');
 	res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
 	strbuf_release(&buf);
 	return res;
 }
 
+
+/*
+ * write_author_script() used to fail to terminate the last line with a "'" and
+ * also escaped "'" incorrectly as "'\\\\''" rather than "'\\''". We check for
+ * the terminating "'" on the last line to see how "'" has been escaped in case
+ * git was upgraded while rebase was stopped.
+ */
+static int quoting_is_broken(const char *s, size_t n)
+{
+	/* Skip any empty lines in case the file was hand edited */
+	while (n > 0 && s[--n] == '\n')
+		; /* empty */
+	if (n > 0 && s[n] != '\'')
+		return 1;
+
+	return 0;
+}
+
 /*
  * Read a list of environment variable assignments (such as the author-script
  * file) into an environment block. Returns -1 on error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
 	struct strbuf script = STRBUF_INIT;
-	int i, count = 0;
-	char *p, *p2;
+	int i, count = 0, sq_bug;
+	const char *p2;
+	char *p;
 
 	if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
 		return -1;
-
+	/* write_author_script() used to quote incorrectly */
+	sq_bug = quoting_is_broken(script.buf, script.len);
 	for (p = script.buf; *p; p++)
-		if (skip_prefix(p, "'\\\\''", (const char **)&p2))
+		if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
+			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
+		else if (skip_prefix(p, "'\\''", &p2))
 			strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
 		else if (*p == '\'')
 			strbuf_splice(&script, p-- - script.buf, 1, "", 0);
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index b72167ecd5..834a124d60 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1382,9 +1382,21 @@ test_expect_success 'rebase -i --gpg-sign=<key-id> overrides commit.gpgSign' '
 test_expect_success 'valid author header after --root swap' '
 	rebase_setup_and_clean author-header no-conflict-branch &&
 	set_fake_editor &&
-	FAKE_LINES="2 1" git rebase -i --root &&
-	git cat-file commit HEAD^ >out &&
-	grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
+	git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
+	git cat-file commit HEAD | grep ^author >expected &&
+	FAKE_LINES="5 1" git rebase -i --root &&
+	git cat-file commit HEAD^ | grep ^author >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'valid author header when author contains single quote' '
+	rebase_setup_and_clean author-header no-conflict-branch &&
+	set_fake_editor &&
+	git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
+	git cat-file commit HEAD | grep ^author >expected &&
+	FAKE_LINES="2" git rebase -i HEAD~2 &&
+	git cat-file commit HEAD | grep ^author >actual &&
+	test_cmp expected actual
 '
 
 test_done
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-07  9:34   ` [PATCH v4 2/2] sequencer: fix quoting in write_author_script Phillip Wood
@ 2018-08-07 10:23     ` Eric Sunshine
  2018-08-07 13:54       ` Phillip Wood
  2018-08-08  9:39     ` Eric Sunshine
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-07 10:23 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>      - Reverted the implementation to v2 with more robust detection of the
>        missing "'" on the last line of the author script based on a
>        suggestion by Eric. This means that this series needs to progress
>        closely with Eric's series of fixes or the fallback code will never
>        be called.

Thanks for working on this. I haven't read the patch closely yet, but
one thing caught my attention as I ran my eye over it...

> +static int quoting_is_broken(const char *s, size_t n)
> +{
> +       /* Skip any empty lines in case the file was hand edited */
> +       while (n > 0 && s[--n] == '\n')
> +               ; /* empty */
> +       if (n > 0 && s[n] != '\'')
> +               return 1;

To be "technically correct", I think the condition in the 'if'
statement should be ">=". It should never happen in practice that the
entire content of the file is a single character followed by zero or
more newlines, but using the proper condition ">=" would save future
readers of this code a "huh?" moment.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-07 10:23     ` Eric Sunshine
@ 2018-08-07 13:54       ` Phillip Wood
  2018-08-08  8:43         ` Eric Sunshine
  0 siblings, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2018-08-07 13:54 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

Hi Eric

On 07/08/18 11:23, Eric Sunshine wrote:
> On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>      - Reverted the implementation to v2 with more robust detection of the
>>        missing "'" on the last line of the author script based on a
>>        suggestion by Eric. This means that this series needs to progress
>>        closely with Eric's series of fixes or the fallback code will never
>>        be called.
> 
> Thanks for working on this. I haven't read the patch closely yet, but
> one thing caught my attention as I ran my eye over it...
> 
>> +static int quoting_is_broken(const char *s, size_t n)
>> +{
>> +       /* Skip any empty lines in case the file was hand edited */
>> +       while (n > 0 && s[--n] == '\n')
>> +               ; /* empty */
>> +       if (n > 0 && s[n] != '\'')
>> +               return 1;
> 
> To be "technically correct", I think the condition in the 'if'
> statement should be ">=". It should never happen in practice that the
> entire content of the file is a single character followed by zero or
> more newlines, but using the proper condition ">=" would save future
> readers of this code a "huh?" moment.
> 

I'm not sure it is that simple. If the script consists solely of a
single quote then we should return 1, if it is a single character that
is not "'" then it should return 0. Currently it returns 0 in both those
cases so is technically broken when the script is "'". If it used ">="
instead then I think it would return 0 when it should return 1 and vice
versa. As you say this shouldn't happen in practice.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] sequencer: fix quoting in write_author_script
  2018-08-03 14:12             ` Phillip Wood
@ 2018-08-07 17:20               ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2018-08-07 17:20 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Eric Sunshine, Phillip Wood, Alban Gruin, Git List,
	Johannes Schindelin

Phillip Wood <phillip.wood@talktalk.net> writes:

> Yes I think the earlier approach with the more robust detection you
> suggested is probably a good compromise. Junio does that sound good to
> you?

Surely, and thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-07  3:31       ` Eric Sunshine
@ 2018-08-07 21:09         ` Junio C Hamano
  2018-08-27 22:34         ` Johannes Schindelin
  1 sibling, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2018-08-07 21:09 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Hilco Wijbenga, Git List, Johannes Schindelin, Phillip Wood,
	Akinori MUSHA

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga <hilco.wijbenga@gmail.com> wrote:
>> But your suggestion did make me think about what behaviour I would
>> like to see, exactly. I like that Git removes commits that no longer
>> serve any purpose (because I've included their changes in earlier
>> commits). So I would not want to keep commits that become empty during
>> the rebase. What I would like to see is that commits that _start out_
>> as empty, are retained. Would such behaviour make sense? Or would that
>> be considered surprising behaviour?
>
> I, personally, have no opinion since I don't use empty commits.
> Perhaps someone more experienced and more long-sighted will chime in.

0661e49a ("git-rebase.txt: document behavioral differences between
modes", 2018-06-27) added the following.  In short, "rebase -i"
should already behave that way.

+ * empty commits:
+
+    am-based rebase will drop any "empty" commits, whether the
+    commit started empty (had no changes relative to its parent to
+    start with) or ended empty (all changes were already applied
+    upstream in other commits).
+
+    merge-based rebase does the same.
+
+    interactive-based rebase will by default drop commits that
+    started empty and halt if it hits a commit that ended up empty.
+    The `--keep-empty` option exists for interactive rebases to allow
+    it to keep commits that started empty.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-07 13:54       ` Phillip Wood
@ 2018-08-08  8:43         ` Eric Sunshine
  2018-08-08 16:01           ` Junio C Hamano
  2018-08-09 10:08           ` Phillip Wood
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-08  8:43 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Aug 7, 2018 at 9:54 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 07/08/18 11:23, Eric Sunshine wrote:
> > On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> >> +       if (n > 0 && s[n] != '\'')
> >> +               return 1;
> >
> > To be "technically correct", I think the condition in the 'if'
> > statement should be ">=". It should never happen in practice that the
> > entire content of the file is a single character followed by zero or
> > more newlines, but using the proper condition ">=" would save future
> > readers of this code a "huh?" moment.
>
> I'm not sure it is that simple. If the script consists solely of a
> single quote then we should return 1, if it is a single character that
> is not "'" then it should return 0. Currently it returns 0 in both those
> cases so is technically broken when the script is "'". If it used ">="
> instead then I think it would return 0 when it should return 1 and vice
> versa. As you say this shouldn't happen in practice.

It shouldn't happen in practice, but if a human (power user) edits
this file, we shouldn't discount the possibility. However, I'm not so
concerned about quoting_is_broken() returning a meaningful answer in
this case since we have much bigger problems if we get such a file.
(Indeed, what answer could quoting_is_broken() return which would be
useful or meaningful for such a malformed file?)

What does concern me is that read_env_script() doesn't seem to care
about such a malformed file; it doesn't do any validation at all.
Contrast that with read_author_ident() which is pretty strict about
the content it expects to find in the file. So, it might make sense to
upgrade read_env_script() to do some sort of validation on each line
(though that shouldn't be in this patch, and doesn't even need to be
in this series). For instance, it could check that each line looks
something like what would be matched by this regex: /[A-Z0-9_]+='.+'/

(And, no, I'm not saying that regex should be used for validation; I'm
just using it as an example.)

As for '>' vs. '>=', it caused more than a slight hiccup when I was
scanning the code, and I worry that future readers could be similarly
impacted.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-07  9:34   ` [PATCH v4 2/2] sequencer: fix quoting in write_author_script Phillip Wood
  2018-08-07 10:23     ` Eric Sunshine
@ 2018-08-08  9:39     ` Eric Sunshine
  2018-08-09 10:11       ` Phillip Wood
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2018-08-08  9:39 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Single quotes should be escaped as \' not \\'. The bad quoting breaks
> the interactive version of 'rebase --root' (which is used when there
> is no '--onto' even if the user does not specify --interactive) for
> authors that contain "'" as sq_dequote() called by read_author_ident()
> errors out on the bad quoting.
> [...]
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -636,42 +636,64 @@ static int write_author_script(const char *message)
>  static int read_env_script(struct argv_array *env)
>  {
>         if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>                 return -1;

Erm, again, not introduced by this patch, but this is leaking 'script'
in the error path. You had plugged this leak in the previous round but
that fix got lost when you reverted to this simpler approach. Not
critical, though; the leak probably ought to be fixed by a separate
patch anyhow (which doesn't necessarily need to be part of this
series).

> +       /* write_author_script() used to quote incorrectly */
> +       sq_bug = quoting_is_broken(script.buf, script.len);
>         for (p = script.buf; *p; p++)
> -               if (skip_prefix(p, "'\\\\''", (const char **)&p2))
> +               if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
> +                       strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
> +               else if (skip_prefix(p, "'\\''", &p2))
>                         strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);

The two strbuf_splice() invocations are identical, so an alternate way
of expressing this would be:

     if ((sq_bug && skip_prefix(p, "'\\\\''", &p2)) ||
        skip_prefix(p, "'\\''", &p2))
        strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);

Not necessarily clearer, and certainly not worth a re-roll.

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> @@ -1382,9 +1382,21 @@ test_expect_success 'rebase -i --gpg-sign=<key-id> overrides commit.gpgSign' '
>  test_expect_success 'valid author header after --root swap' '
>         rebase_setup_and_clean author-header no-conflict-branch &&
>         set_fake_editor &&
> -       FAKE_LINES="2 1" git rebase -i --root &&
> -       git cat-file commit HEAD^ >out &&
> -       grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
> +       git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
> +       git cat-file commit HEAD | grep ^author >expected &&
> +       FAKE_LINES="5 1" git rebase -i --root &&
> +       git cat-file commit HEAD^ | grep ^author >actual &&
> +       test_cmp expected actual
> +'

It probably would have been clearer to change to this test to use
test_cmp() instead of 'grep' in a separate patch since it's not
directly related to the fixes in this patch, and then to do the
"commit --amend" in this patch. However, probably not worth a re-roll.

> +test_expect_success 'valid author header when author contains single quote' '
> +       rebase_setup_and_clean author-header no-conflict-branch &&
> +       set_fake_editor &&
> +       git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
> +       git cat-file commit HEAD | grep ^author >expected &&
> +       FAKE_LINES="2" git rebase -i HEAD~2 &&
> +       git cat-file commit HEAD | grep ^author >actual &&
> +       test_cmp expected actual
>  '

This test is so similar to the one above that it is tempting to try to
refactor the code so the tests can share implementation, however, the
end result would probably be less readable, so they're probably fine
as-is.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 1/2] sequencer: handle errors from read_author_ident()
  2018-08-07  9:34   ` [PATCH v4 1/2] sequencer: handle errors from read_author_ident() Phillip Wood
@ 2018-08-08  9:43     ` Eric Sunshine
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-08  9:43 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Check for a NULL return value from read_author_ident() that indicates
> an error. Previously the NULL author was passed to commit_tree() which
> would then fallback to using the default author when creating the new
> commit. This changed the date and potentially the author of the commit
> which corrupted the author data compared to its expected value.
>
> Helped-by: Eric Sunshine <sunshine@sunshineco.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>     changes since v3:
>      - Implemented the simpler scheme suggested by Eric

This iteration is much nicer, much simpler, and much easier to review. Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 0/2] fix author-script quoting
  2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
  2018-08-07  9:34   ` [PATCH v4 1/2] sequencer: handle errors from read_author_ident() Phillip Wood
  2018-08-07  9:34   ` [PATCH v4 2/2] sequencer: fix quoting in write_author_script Phillip Wood
@ 2018-08-08  9:51   ` Eric Sunshine
  2 siblings, 0 replies; 57+ messages in thread
From: Eric Sunshine @ 2018-08-08  9:51 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> I've updated these based on Eric's suggestions, hopefully they're good
> to go now. Thanks Eric for you help.

Thanks, I left a few comments on patch 2/2. Aside from the '>' vs.
'>=' issue (over which I lost more than a few minutes cogitating), I
don't think there are any blockers this time around.

There are 2 or 3 things which can be done as follow-ups (indeed, even
the '>' vs. '>=' could be so), but I don't think any of them are so
critical that they need to be part of this series (for instance, the
strbuf leak in the error path has been around for some time, and can
be fixed later).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-08  8:43         ` Eric Sunshine
@ 2018-08-08 16:01           ` Junio C Hamano
  2018-08-09 10:06             ` Phillip Wood
  2018-08-09 10:08           ` Phillip Wood
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2018-08-08 16:01 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Phillip Wood, Git List, Johannes Schindelin

Eric Sunshine <sunshine@sunshineco.com> writes:

> What does concern me is that read_env_script() doesn't seem to care
> about such a malformed file; it doesn't do any validation at all.
> Contrast that with read_author_ident() which is pretty strict about
> the content it expects to find in the file. So, it might make sense to
> upgrade read_env_script() to do some sort of validation on each line
> (though that shouldn't be in this patch, and doesn't even need to be
> in this series).

I do not think it is within the scope of these bugfix patches, but I
tend to agree that in the longer term it would be a good idea to
unify these two helpers that read exactly the same file stored at
rebase_path_author_script(), and make the result stricter, rather
than tightening two helpers independently.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-08 16:01           ` Junio C Hamano
@ 2018-08-09 10:06             ` Phillip Wood
  0 siblings, 0 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-09 10:06 UTC (permalink / raw)
  To: Junio C Hamano, Eric Sunshine; +Cc: Phillip Wood, Git List, Johannes Schindelin

Hi Junio
On 08/08/18 17:01, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
>> What does concern me is that read_env_script() doesn't seem to care
>> about such a malformed file; it doesn't do any validation at all.
>> Contrast that with read_author_ident() which is pretty strict about
>> the content it expects to find in the file. So, it might make sense to
>> upgrade read_env_script() to do some sort of validation on each line
>> (though that shouldn't be in this patch, and doesn't even need to be
>> in this series).
> 
> I do not think it is within the scope of these bugfix patches, but I
> tend to agree that in the longer term it would be a good idea to
> unify these two helpers that read exactly the same file stored at
> rebase_path_author_script(), and make the result stricter, rather
> than tightening two helpers independently.

That's my longer term goal, ideally sharing code with am.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-08  8:43         ` Eric Sunshine
  2018-08-08 16:01           ` Junio C Hamano
@ 2018-08-09 10:08           ` Phillip Wood
  1 sibling, 0 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-09 10:08 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

Hi Eric

On 08/08/18 09:43, Eric Sunshine wrote:
> On Tue, Aug 7, 2018 at 9:54 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> On 07/08/18 11:23, Eric Sunshine wrote:
>>> On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>>> +       if (n > 0 && s[n] != '\'')
>>>> +               return 1;
>>>
>>> To be "technically correct", I think the condition in the 'if'
>>> statement should be ">=". It should never happen in practice that the
>>> entire content of the file is a single character followed by zero or
>>> more newlines, but using the proper condition ">=" would save future
>>> readers of this code a "huh?" moment.
>>
>> I'm not sure it is that simple. If the script consists solely of a
>> single quote then we should return 1, if it is a single character that
>> is not "'" then it should return 0. Currently it returns 0 in both those
>> cases so is technically broken when the script is "'". If it used ">="
>> instead then I think it would return 0 when it should return 1 and vice
>> versa. As you say this shouldn't happen in practice.
> 
> It shouldn't happen in practice, but if a human (power user) edits
> this file, we shouldn't discount the possibility. However, I'm not so
> concerned about quoting_is_broken() returning a meaningful answer in
> this case since we have much bigger problems if we get such a file.
> (Indeed, what answer could quoting_is_broken() return which would be
> useful or meaningful for such a malformed file?)
> 
> What does concern me is that read_env_script() doesn't seem to care
> about such a malformed file; it doesn't do any validation at all.
> Contrast that with read_author_ident() which is pretty strict about
> the content it expects to find in the file. So, it might make sense to
> upgrade read_env_script() to do some sort of validation on each line
> (though that shouldn't be in this patch, and doesn't even need to be
> in this series). For instance, it could check that each line looks
> something like what would be matched by this regex: /[A-Z0-9_]+='.+'/

I think it should just share some code with get_author_ident() that uses
sq_dequote(), that's for another day though.

> (And, no, I'm not saying that regex should be used for validation; I'm
> just using it as an example.)
> 
> As for '>' vs. '>=', it caused more than a slight hiccup when I was
> scanning the code, and I worry that future readers could be similarly
> impacted.

I don't have a strong opinion either way, if Junio wants to fix it up
I'd be happy with that, but I'm not keen on another iteration just for that.

Best Wishes

Phillip



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script
  2018-08-08  9:39     ` Eric Sunshine
@ 2018-08-09 10:11       ` Phillip Wood
  0 siblings, 0 replies; 57+ messages in thread
From: Phillip Wood @ 2018-08-09 10:11 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On 08/08/18 10:39, Eric Sunshine wrote:
> On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> Single quotes should be escaped as \' not \\'. The bad quoting breaks
>> the interactive version of 'rebase --root' (which is used when there
>> is no '--onto' even if the user does not specify --interactive) for
>> authors that contain "'" as sq_dequote() called by read_author_ident()
>> errors out on the bad quoting.
>> [...]
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>> diff --git a/sequencer.c b/sequencer.c
>> @@ -636,42 +636,64 @@ static int write_author_script(const char *message)
>>  static int read_env_script(struct argv_array *env)
>>  {
>>         if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
>>                 return -1;
> 
> Erm, again, not introduced by this patch, but this is leaking 'script'
> in the error path. You had plugged this leak in the previous round but
> that fix got lost when you reverted to this simpler approach. Not
> critical, though; the leak probably ought to be fixed by a separate
> patch anyhow (which doesn't necessarily need to be part of this
> series).

I'm hoping this will go away when I work on unifying the code to read
the author-script with am.

>> +       /* write_author_script() used to quote incorrectly */
>> +       sq_bug = quoting_is_broken(script.buf, script.len);
>>         for (p = script.buf; *p; p++)
>> -               if (skip_prefix(p, "'\\\\''", (const char **)&p2))
>> +               if (sq_bug && skip_prefix(p, "'\\\\''", &p2))
>> +                       strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
>> +               else if (skip_prefix(p, "'\\''", &p2))
>>                         strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
> 
> The two strbuf_splice() invocations are identical, so an alternate way
> of expressing this would be:
> 
>      if ((sq_bug && skip_prefix(p, "'\\\\''", &p2)) ||
>         skip_prefix(p, "'\\''", &p2))
>         strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
> 
> Not necessarily clearer, and certainly not worth a re-roll.
> 
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> @@ -1382,9 +1382,21 @@ test_expect_success 'rebase -i --gpg-sign=<key-id> overrides commit.gpgSign' '
>>  test_expect_success 'valid author header after --root swap' '
>>         rebase_setup_and_clean author-header no-conflict-branch &&
>>         set_fake_editor &&
>> -       FAKE_LINES="2 1" git rebase -i --root &&
>> -       git cat-file commit HEAD^ >out &&
>> -       grep "^author ..*> [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$" out
>> +       git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
>> +       git cat-file commit HEAD | grep ^author >expected &&
>> +       FAKE_LINES="5 1" git rebase -i --root &&
>> +       git cat-file commit HEAD^ | grep ^author >actual &&
>> +       test_cmp expected actual
>> +'
> 
> It probably would have been clearer to change to this test to use
> test_cmp() instead of 'grep' in a separate patch since it's not
> directly related to the fixes in this patch, and then to do the
> "commit --amend" in this patch. However, probably not worth a re-roll.

In hindsight that would have been clearer, but hopefully this version
isn't too confusing

>> +test_expect_success 'valid author header when author contains single quote' '
>> +       rebase_setup_and_clean author-header no-conflict-branch &&
>> +       set_fake_editor &&
>> +       git commit --amend --author="Au ${SQ}thor <author@example.com>" --no-edit &&
>> +       git cat-file commit HEAD | grep ^author >expected &&
>> +       FAKE_LINES="2" git rebase -i HEAD~2 &&
>> +       git cat-file commit HEAD | grep ^author >actual &&
>> +       test_cmp expected actual
>>  '
> 
> This test is so similar to the one above that it is tempting to try to
> refactor the code so the tests can share implementation, however, the
> end result would probably be less readable, so they're probably fine
> as-is.

Yes, I think it could look messy

Thank you very much for all your help and comments on this series

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit
  2018-08-07  3:31       ` Eric Sunshine
  2018-08-07 21:09         ` Junio C Hamano
@ 2018-08-27 22:34         ` Johannes Schindelin
  1 sibling, 0 replies; 57+ messages in thread
From: Johannes Schindelin @ 2018-08-27 22:34 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Hilco Wijbenga, Git List, Junio C Hamano, Phillip Wood,
	Akinori MUSHA

Hi,

On Mon, 6 Aug 2018, Eric Sunshine wrote:

> On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga <hilco.wijbenga@gmail.com> wrote:
> > But your suggestion did make me think about what behaviour I would
> > like to see, exactly. I like that Git removes commits that no longer
> > serve any purpose (because I've included their changes in earlier
> > commits). So I would not want to keep commits that become empty during
> > the rebase. What I would like to see is that commits that _start out_
> > as empty, are retained. Would such behaviour make sense? Or would that
> > be considered surprising behaviour?
> 
> I, personally, have no opinion since I don't use empty commits.
> Perhaps someone more experienced and more long-sighted will chime in.

I got aware about two weeks ago that my mail provider fails to deliver all
the mails that are addressed to me. Two days ago, I managed to get this
mail (and the thread) via public-inbox (thanks, Eric!!!). Hence my late
reply.

Hilco: if you provide a patch to extend the test suite to demonstrate the
breakage (via `test_expect_failure`), I promise to try my best to provide
a fix on top.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2018-08-27 22:34 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31  7:33 [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Eric Sunshine
2018-07-31  7:33 ` [PATCH v2 1/4] sequencer: fix "rebase -i --root" corrupting author header Eric Sunshine
2018-07-31  7:33 ` [PATCH v2 2/4] sequencer: fix "rebase -i --root" corrupting author header timezone Eric Sunshine
2018-07-31  9:50   ` Phillip Wood
2018-07-31 10:15     ` Eric Sunshine
2018-07-31  7:33 ` [PATCH v2 3/4] sequencer: fix "rebase -i --root" corrupting author header timestamp Eric Sunshine
2018-07-31 10:00   ` Phillip Wood
2018-07-31 10:30     ` Eric Sunshine
2018-07-31  7:33 ` [PATCH v2 4/4] sequencer: don't die() on bogus user-edited timestamp Eric Sunshine
2018-07-31 10:02   ` Phillip Wood
2018-07-31 10:38     ` Eric Sunshine
2018-07-31 10:05 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Phillip Wood
2018-07-31 10:46   ` Eric Sunshine
2018-07-31 11:19     ` Phillip Wood
2018-07-31 11:27     ` Eric Sunshine
2018-07-31 11:15 ` [PATCH v2 0/2] Fix author script quoting Phillip Wood
2018-07-31 11:15   ` [PATCH v2 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
2018-07-31 20:47     ` Eric Sunshine
2018-08-01  9:28       ` Phillip Wood
2018-07-31 11:15   ` [PATCH v2 2/2] sequencer: fix quoting in write_author_script Phillip Wood
2018-07-31 21:39     ` Eric Sunshine
2018-08-01 10:24       ` Phillip Wood
2018-08-01 15:22         ` Junio C Hamano
2018-08-01 15:50       ` Phillip Wood
2018-08-01 19:19         ` Eric Sunshine
2018-08-01  1:30 ` [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit Hilco Wijbenga
2018-08-01  6:22   ` Eric Sunshine
2018-08-07  1:19     ` Hilco Wijbenga
2018-08-07  3:31       ` Eric Sunshine
2018-08-07 21:09         ` Junio C Hamano
2018-08-27 22:34         ` Johannes Schindelin
2018-08-01 23:25 ` brian m. carlson
2018-08-02  8:09   ` Eric Sunshine
2018-08-02 11:20 ` [PATCH v3 0/2] Fix author script quoting Phillip Wood
2018-08-02 11:20   ` [PATCH v3 1/2] sequencer: handle errors in read_author_ident() Phillip Wood
2018-08-03  7:09     ` Eric Sunshine
2018-08-03 15:53       ` Junio C Hamano
2018-08-02 11:20   ` [PATCH v3 2/2] sequencer: fix quoting in write_author_script Phillip Wood
2018-08-02 17:27     ` Junio C Hamano
2018-08-03  7:59       ` Eric Sunshine
2018-08-03  9:33         ` Phillip Wood
2018-08-03 10:02           ` Eric Sunshine
2018-08-03 14:12             ` Phillip Wood
2018-08-07 17:20               ` Junio C Hamano
2018-08-07  9:34 ` [PATCH v4 0/2] fix author-script quoting Phillip Wood
2018-08-07  9:34   ` [PATCH v4 1/2] sequencer: handle errors from read_author_ident() Phillip Wood
2018-08-08  9:43     ` Eric Sunshine
2018-08-07  9:34   ` [PATCH v4 2/2] sequencer: fix quoting in write_author_script Phillip Wood
2018-08-07 10:23     ` Eric Sunshine
2018-08-07 13:54       ` Phillip Wood
2018-08-08  8:43         ` Eric Sunshine
2018-08-08 16:01           ` Junio C Hamano
2018-08-09 10:06             ` Phillip Wood
2018-08-09 10:08           ` Phillip Wood
2018-08-08  9:39     ` Eric Sunshine
2018-08-09 10:11       ` Phillip Wood
2018-08-08  9:51   ` [PATCH v4 0/2] fix author-script quoting Eric Sunshine

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).