git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages
@ 2020-03-08 18:29 Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-08 18:29 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

The ref-filter API does not correctly handle commit or tag messages that use
CRLF as the line terminator. Such messages can be created with the--verbatim 
option of git commit and git tag, or by using git commit-tree directly.

This impacts the output of git branch -v, and git branch, git tag and git
for-each-ref when used with a --format argument containing the atoms 
%(contents:subject) or %(contents:body).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log in
the presence of CRLF in messages, to prevent futur regressions.

Philippe Blain (3):
  t: add lib-crlf-messages.sh for messages containing CRLF
  ref-filter: teach the API to correctly handle CRLF
  log: add tests for messages containing CRLF to t4202

 ref-filter.c             | 19 +++++++++--
 t/lib-crlf-messages.sh   | 73 ++++++++++++++++++++++++++++++++++++++++
 t/t3203-branch-output.sh | 26 +++++++++++---
 t/t4202-log.sh           | 24 +++++++++++++
 t/t6300-for-each-ref.sh  |  5 +++
 t/t7004-tag.sh           |  7 ++++
 6 files changed, 147 insertions(+), 7 deletions(-)
 create mode 100644 t/lib-crlf-messages.sh


base-commit: 076cbdcd739aeb33c1be87b73aebae5e43d7bcc5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-576%2Fphil-blain%2Ffix-branch-verbose-crlf-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-576/phil-blain/fix-branch-verbose-crlf-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/576
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
@ 2020-03-08 18:29 ` Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF Philippe Blain via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-08 18:29 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A following commit will fix a bug in the ref-filter API that causes
commit and tag messages containing CRLF to be incorrectly parsed and
displayed.

Add a test library (t/lib-crlf-messages.sh) that creates refs with such
commit messages, so that we can easily test that this bug does not
appear in other commands in the future.

The function test_crlf_subject_body_and_contents can be used to test
that the `--format` option of `branch`, `tag`, `for-each-ref` and
`log` correctly displays the subject, body and raw body of commits and
tag messages.

The commits are created using `commit-tree` such that the current branch
in the test repository is not affected when `test_create_crlf_refs` is
called in a test. This is to done so that the CRLF tests can be inserted
anywhere in a test script where it makes sense to do so, without having
to potentially modify further test that depend on output that would be
modified if the current branch gained new commits.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/lib-crlf-messages.sh | 73 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 t/lib-crlf-messages.sh

diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
new file mode 100644
index 00000000000..64d2ad12019
--- /dev/null
+++ b/t/lib-crlf-messages.sh
@@ -0,0 +1,73 @@
+# Setup refs with commit and tag messages containing CRLF
+
+create_crlf_ref () {
+	message="$1" &&
+	subject="$2" &&
+	body="$3" &&
+	branch="$4" &&
+	printf "${message}" >.crlf-message-${branch}.txt &&
+	printf "${subject}" >.crlf-subject-${branch}.txt &&
+	printf "${body}" >.crlf-body-${branch}.txt &&
+    test_tick &&
+	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
+	git branch ${branch} ${hash} &&
+	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
+}
+
+create_crlf_refs () {
+	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	body="Body first line\r\nBody second line\r\n" &&
+	subject="Subject first line" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf" &&
+	message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-empty-lines-after-subject" &&
+	message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	subject="Subject first line Subject second line" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-two-line-subject"
+}
+
+test_create_crlf_refs () {
+	test_expect_success 'setup refs with CRLF commit messages' '
+		create_crlf_refs
+	'
+}
+
+cleanup_crlf_refs () {
+	for branch in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
+		git branch -D ${branch} &&
+		git tag -d tag-${branch} &&
+		rm .crlf-message-${branch}.txt &&
+		rm .crlf-subject-${branch}.txt &&
+		rm .crlf-body-${branch}.txt
+	done
+}
+
+test_cleanup_crlf_refs () {
+	test_expect_success 'clenup refs with CRLF commit messages' '
+		cleanup_crlf_refs
+	'
+}
+
+test_crlf_subject_body_and_contents() {
+	command_and_args="$@" &&
+	command=$1 &&
+	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
+		atoms="(contents:subject) (contents:body) (contents)"
+	elif [ ${command} = "log" ]; then
+		atoms="s b B"
+	fi &&
+	files="subject body message" &&
+	while  [ -n "${atoms}" ]; do
+		set ${atoms} && atom=$1 && shift && atoms="$*" &&
+		set ${files} &&	file=$1 && shift && files="$*" &&
+		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
+			rm -f expect &&
+			for ref in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
+				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
+				printf \"\n\" >>expect
+			done &&
+			git $command_and_args --format=\"%${atom}\" >actual &&
+			test_cmp expect actual
+		"
+	done
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF
  2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-03-08 18:29 ` Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-08 18:29 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter API does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
`--verbatim` option of `git commit` and `git tag`, or by using `git
commit-tree` directly.

This impacts the output of `git branch -v`, and `git branch`, `git
tag` and `git for-each-ref` when used with a `--format` argument
containing the atoms `%(contents:subject)` or `%(contents:body)`.

Fix this bug in ref-filter by improving the logic in `copy_subject` and
`find_subpos`.

Add tests for `branch`, `tag` and `for-each-ref` using
lib-crlf-messages.sh. The 'make commits' test at the beginning of
t3203-branch-output.sh needs to be modified since it did not use
`test_tick` and thus the commit hashes were not reproducible. For
simplicity, use `test_commit` as the content and name of the files
created in this setup test are irrelevant to the rest of the test
script.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             | 19 +++++++++++++++++--
 t/t3203-branch-output.sh | 26 +++++++++++++++++++++-----
 t/t6300-for-each-ref.sh  |  5 +++++
 t/t7004-tag.sh           |  7 +++++++
 4 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 79bb5206783..537cc4de42c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len)
 {
 	char *r = xmemdupz(buf, len);
 	int i;
+	struct strbuf sb = STRBUF_INIT;
 
-	for (i = 0; i < len; i++)
+	strbuf_attach(&sb, r, len, len + 1);
+	for (i = 0; i < sb.len; i++) {
 		if (r[i] == '\n')
 			r[i] = ' ';
+		if (r[i] == '\r') {
+			strbuf_remove(&sb, i, 1);
+			i -= 1;
+		}
+	}
+	strbuf_detach(&sb, NULL);
 
 	return r;
 }
@@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf,
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
+		/*  protect against messages that might contain \r\n */
+		if (*eol == '\r')
+			eol++;
 		buf = eol;
 	}
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
 	if (*sublen && (*sub)[*sublen - 1] == '\n')
 		*sublen -= 1;
+	/*  protect against commit messages that might contain \r\n */
+	else if (*sublen && (*sub)[*sublen - 1] == '\r')
+		*sublen -= 3; /* drop '\r\n\r' */
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	/* and protect against commit messages that might contain \r\n */
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 71818b90f00..1235089619c 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -3,13 +3,11 @@
 test_description='git branch display tests'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 test_expect_success 'make commits' '
-	echo content >file &&
-	git add file &&
-	git commit -m one &&
-	echo content >>file &&
-	git commit -a -m two
+	test_commit one &&
+	test_commit two
 '
 
 test_expect_success 'make branches' '
@@ -95,6 +93,24 @@ test_expect_success 'git branch --ignore-case --list -v pattern shows branch sum
 	awk "{print \$NF}" <tmp >actual &&
 	test_cmp expect actual
 '
+test_create_crlf_refs
+
+test_expect_success 'git branch -v works with CRLF input' '
+	cat >expect <<-EOF &&
+	  branch-one                     139b20d two
+	  branch-two                     d79ce16 one
+	  crlf                           2113b0e Subject first line
+	  crlf-empty-lines-after-subject 0a9530d Subject first line
+	  crlf-two-line-subject          f9ded1f Subject first line Subject second line
+	* master                         139b20d two
+	EOF
+	git branch -v >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_cleanup_crlf_refs
 
 test_expect_success 'git branch -v pattern does not show branch summaries' '
 	test_must_fail git branch -v branch*
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9c910ce7467..467924de3df 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -8,6 +8,7 @@ test_description='for-each-ref test'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # Mon Jul 3 23:18:43 2006 +0000
 datestamp=1151968723
@@ -888,4 +889,8 @@ test_expect_success 'for-each-ref --ignore-case ignores case' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 6db92bd3ba6..25f465f34ea 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -10,6 +10,7 @@ Tests for operations with tags.'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # creating and listing lightweight tags:
 
@@ -1969,6 +1970,12 @@ test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_cleanup_crlf_refs
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/3] log: add tests for messages containing CRLF to t4202
  2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
  2020-03-08 18:29 ` [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF Philippe Blain via GitGitGadget
@ 2020-03-08 18:29 ` Philippe Blain via GitGitGadget
  2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
  4 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-08 18:29 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A previous commit fixed a bug in the ref-filter API causing messages
containing CRLF to be incorrectly parsed and displayed.

Add a test to also check that `git log` correctly handles such messages,
to prevent futur regressions.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/t4202-log.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 4694b6d0ce7..1e6149372c5 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -6,6 +6,7 @@ test_description='git log'
 . "$TEST_DIRECTORY/lib-gpg.sh"
 . "$TEST_DIRECTORY/lib-terminal.sh"
 . "$TEST_DIRECTORY/lib-log-graph.sh"
+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
 
 test_cmp_graph () {
 	lib_test_cmp_graph --format=%s "$@"
@@ -105,6 +106,29 @@ test_expect_success 'oneline' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_expect_success 'oneline with CRLF messages' '
+	echo "002093e Subject first line" >expect &&
+	git log --oneline -1 crlf >actual-branch &&
+	git log --oneline -1 tag-crlf >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag &&
+	echo "6f814b0 Subject first line" >expect &&
+	git log --oneline -1 crlf-empty-lines-after-subject >actual-branch &&
+	git log --oneline -1 tag-crlf-empty-lines-after-subject >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag &&
+	echo "8c58a85 Subject first line Subject second line" >expect &&
+	git log --oneline -1 crlf-two-line-subject >actual-branch &&
+	git log --oneline -1 tag-crlf-two-line-subject >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag
+'
+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
+
+test_cleanup_crlf_refs
+
 test_expect_success 'diff-filter=A' '
 
 	git log --no-renames --pretty="format:%s" --diff-filter=A HEAD > actual &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages
  2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-03-08 18:29 ` [PATCH 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
@ 2020-03-09 15:14 ` Junio C Hamano
  2020-03-10  2:19   ` Philippe Blain
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
  4 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-03-09 15:14 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The ref-filter API does not correctly handle commit or tag messages that use
> CRLF as the line terminator. Such messages can be created with the--verbatim 
> option of git commit and git tag, or by using git commit-tree directly.
>
> This impacts the output of git branch -v, and git branch, git tag and git
> for-each-ref when used with a --format argument containing the atoms 
> %(contents:subject) or %(contents:body).

What is missing from the above is the definition of "correct".
Without saying what you consider in the current behaviour is
"incorrect" and why, the first sentence does not give the right
information to readers.

Let me speculate why "such messages" are created by end users using
the --verbatim option.  They probably have unusual message that
needs to have whitespaces and characters that are stripped without
the option at the end of the lines in the message, and they
explicitly ask Git not to lose them with the option.  Perhaps CR may
be among those that they want to retain.

So "git for-each-ref --format='%(contents:body)'" should retain
these CRs and whitespaces at the end of these lines, but you are
reporting an incorrect behaviour, so perhaps somehow we lose them?

That does indeed sound like a bug.  We should keep them in the
output if the user took pains to deliberately place them in the
commit.  Unless the user tells us to clean/hide them with an option,
that is.

But as you did not say what your definition of correctness is in the
above two paragraphs, it is not clear to readers if that is the bug
you are trying to address, or even if such a bug exists in the first
place.

(haven't read the remainder of the cover or the patches yet).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages
  2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
@ 2020-03-10  2:19   ` Philippe Blain
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-03-10  2:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain via GitGitGadget, git, Michael J Gruber,
	Matthieu Moy, John Keeping, Karthik Nayak, Jeff King, Alex Henrie

Hi Junio,

> Le 9 mars 2020 à 11:14, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> The ref-filter API does not correctly handle commit or tag messages that use
>> CRLF as the line terminator. Such messages can be created with the--verbatim 
>> option of git commit and git tag, or by using git commit-tree directly.
>> 
>> This impacts the output of git branch -v, and git branch, git tag and git
>> for-each-ref when used with a --format argument containing the atoms 
>> %(contents:subject) or %(contents:body).
> 
> What is missing from the above is the definition of "correct".
> Without saying what you consider in the current behaviour is
> "incorrect" and why, the first sentence does not give the right
> information to readers.

You are right, I'll send a v2 shortly with an updated cover letter and commit message.

> Let me speculate why "such messages" are created by end users using
> the --verbatim option.  They probably have unusual message that
> needs to have whitespaces and characters that are stripped without
> the option at the end of the lines in the message, and they
> explicitly ask Git not to lose them with the option.  Perhaps CR may
> be among those that they want to retain.

That's possible. In my case what made me discover the bug is our old version of GitLab at work,
in which messages for merge commit created using the web UI contain CRLF [1].

[1] https://gitlab.com/gitlab-org/gitlab-foss/-/issues/31671



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages
  2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
@ 2020-03-10  2:24 ` Philippe Blain via GitGitGadget
  2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
                     ` (5 more replies)
  4 siblings, 6 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-10  2:24 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

The ref-filter API does not correctly handle commit or tag messages that use
CRLF as the line terminator. Such messages can be created with the--verbatim 
option of git commit and git tag, or by using git commit-tree directly.

This impacts the output git branch, git tag and git for-each-refwhen used
with a --format argument containing the atoms%(contents:subject) or 
%(contents:body), as well as the output ofgit branch --verbose, which uses 
%(contents:subject) internally.

The function find_subpos in ref-filter.c looks for two consecutive '\n' to
find the end of the subject line, a sequence which is absent in messages
using CRLF. This results in the whole message being parsed as the subject
line (%(contents:subject)), and the body of the message (%(contents:body))
being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched,
resulting in the escape sequence '^M' being output verbatim in most terminal
emulators:

$ git branch --verbose
* crlf    2113b0e Subject first line^M ^M Body first line^M Body second line

This bug is a regression for git branch --verbose, which bisects down to
949af06 (branch: use ref-filter printing APIs, 2017-01-10).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log in
the presence of CRLF in messages, to prevent futur regressions.

changes since v1:

 * improved the cover letter and the commit message of the 2nd patch to
   actually describe the bug this series is fixing


----------------------------------------------------------------------------

v1:

The ref-filter API does not correctly handle commit or tag messages that use
CRLF as the line terminator. Such messages can be created with the--verbatim 
option of git commit and git tag, or by using git commit-tree directly.

This impacts the output of git branch -v, and git branch, git tag and git
for-each-ref when used with a --format argument containing the atoms 
%(contents:subject) or %(contents:body).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log in
the presence of CRLF in messages, to prevent futur regressions.

Philippe Blain (3):
  t: add lib-crlf-messages.sh for messages containing CRLF
  ref-filter: fix the API to correctly handle CRLF
  log: add tests for messages containing CRLF to t4202

 ref-filter.c             | 19 +++++++++--
 t/lib-crlf-messages.sh   | 73 ++++++++++++++++++++++++++++++++++++++++
 t/t3203-branch-output.sh | 26 +++++++++++---
 t/t4202-log.sh           | 24 +++++++++++++
 t/t6300-for-each-ref.sh  |  5 +++
 t/t7004-tag.sh           |  7 ++++
 6 files changed, 147 insertions(+), 7 deletions(-)
 create mode 100644 t/lib-crlf-messages.sh


base-commit: 076cbdcd739aeb33c1be87b73aebae5e43d7bcc5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-576%2Fphil-blain%2Ffix-branch-verbose-crlf-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-576/phil-blain/fix-branch-verbose-crlf-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/576

Range-diff vs v1:

 1:  b2521138035 = 1:  b2521138035 t: add lib-crlf-messages.sh for messages containing CRLF
 2:  c68bc2b3788 ! 2:  aab1f45ba97 ref-filter: teach the API to correctly handle CRLF
     @@ -1,26 +1,49 @@
      Author: Philippe Blain <levraiphilippeblain@gmail.com>
      
     -    ref-filter: teach the API to correctly handle CRLF
     +    ref-filter: fix the API to correctly handle CRLF
      
          The ref-filter API does not correctly handle commit or tag messages that
          use CRLF as the line terminator. Such messages can be created with the
          `--verbatim` option of `git commit` and `git tag`, or by using `git
          commit-tree` directly.
      
     -    This impacts the output of `git branch -v`, and `git branch`, `git
     -    tag` and `git for-each-ref` when used with a `--format` argument
     -    containing the atoms `%(contents:subject)` or `%(contents:body)`.
     +    This impacts the output `git branch`, `git tag` and `git for-each-ref`
     +    when used with a `--format` argument containing the atoms
     +    `%(contents:subject)` or `%(contents:body)`, as well as the output of
     +    `git branch --verbose`, which uses `%(contents:subject)` internally.
      
     -    Fix this bug in ref-filter by improving the logic in `copy_subject` and
     -    `find_subpos`.
     +    The function find_subpos in ref-filter.c looks for two consecutive '\n'
     +    to find the end of the subject line, a sequence which is absent in
     +    messages using CRLF. This results in the whole message being parsed as
     +    the subject line (`%(contents:subject)`), and the body of the message
     +    (`%(contents:body)`)  being empty.
     +
     +    Moreover, in copy_subject, '\n' is replaced by space, but '\r' is
     +    untouched, resulting in the escape sequence '^M' being output verbatim
     +    in most terminal emulators:
     +
     +        $ git branch --verbose
     +        * crlf    2113b0e Subject first line^M ^M Body first line^M Body second line
     +
     +    This bug is a regression for `git branch --verbose`, which
     +    bisects down to 949af0684c (branch: use ref-filter printing APIs,
     +    2017-01-10).
     +
     +    Fix this bug in ref-filter by hardening the logic in `copy_subject` and
     +    `find_subpos` to correctly parse messages containing CRFL.
      
          Add tests for `branch`, `tag` and `for-each-ref` using
     -    lib-crlf-messages.sh. The 'make commits' test at the beginning of
     -    t3203-branch-output.sh needs to be modified since it did not use
     -    `test_tick` and thus the commit hashes were not reproducible. For
     -    simplicity, use `test_commit` as the content and name of the files
     -    created in this setup test are irrelevant to the rest of the test
     -    script.
     +    lib-crlf-messages.sh.
     +
     +    The 'make commits' test at the beginning of t3203-branch-output.sh needs
     +    to be modified since it did not use `test_tick` and thus the commit
     +    hashes were not reproducible. For simplicity, use `test_commit` as the
     +    content and name of the files created in this setup test are irrelevant
     +    to the rest of the test script.
     +
     +    `test_cleanup_crlf_refs` is used in t3203-branch-output.sh and
     +    t7004-tag.sh to avoid having to modify the expected output in later
     +    tests.
      
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
      
 3:  3e5b8ace7b8 = 3:  c84092e457c log: add tests for messages containing CRLF to t4202

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
@ 2020-03-10  2:24   ` Philippe Blain via GitGitGadget
  2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-10  2:24 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A following commit will fix a bug in the ref-filter API that causes
commit and tag messages containing CRLF to be incorrectly parsed and
displayed.

Add a test library (t/lib-crlf-messages.sh) that creates refs with such
commit messages, so that we can easily test that this bug does not
appear in other commands in the future.

The function test_crlf_subject_body_and_contents can be used to test
that the `--format` option of `branch`, `tag`, `for-each-ref` and
`log` correctly displays the subject, body and raw body of commits and
tag messages.

The commits are created using `commit-tree` such that the current branch
in the test repository is not affected when `test_create_crlf_refs` is
called in a test. This is to done so that the CRLF tests can be inserted
anywhere in a test script where it makes sense to do so, without having
to potentially modify further test that depend on output that would be
modified if the current branch gained new commits.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/lib-crlf-messages.sh | 73 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 t/lib-crlf-messages.sh

diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
new file mode 100644
index 00000000000..64d2ad12019
--- /dev/null
+++ b/t/lib-crlf-messages.sh
@@ -0,0 +1,73 @@
+# Setup refs with commit and tag messages containing CRLF
+
+create_crlf_ref () {
+	message="$1" &&
+	subject="$2" &&
+	body="$3" &&
+	branch="$4" &&
+	printf "${message}" >.crlf-message-${branch}.txt &&
+	printf "${subject}" >.crlf-subject-${branch}.txt &&
+	printf "${body}" >.crlf-body-${branch}.txt &&
+    test_tick &&
+	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
+	git branch ${branch} ${hash} &&
+	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
+}
+
+create_crlf_refs () {
+	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	body="Body first line\r\nBody second line\r\n" &&
+	subject="Subject first line" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf" &&
+	message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-empty-lines-after-subject" &&
+	message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	subject="Subject first line Subject second line" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-two-line-subject"
+}
+
+test_create_crlf_refs () {
+	test_expect_success 'setup refs with CRLF commit messages' '
+		create_crlf_refs
+	'
+}
+
+cleanup_crlf_refs () {
+	for branch in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
+		git branch -D ${branch} &&
+		git tag -d tag-${branch} &&
+		rm .crlf-message-${branch}.txt &&
+		rm .crlf-subject-${branch}.txt &&
+		rm .crlf-body-${branch}.txt
+	done
+}
+
+test_cleanup_crlf_refs () {
+	test_expect_success 'clenup refs with CRLF commit messages' '
+		cleanup_crlf_refs
+	'
+}
+
+test_crlf_subject_body_and_contents() {
+	command_and_args="$@" &&
+	command=$1 &&
+	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
+		atoms="(contents:subject) (contents:body) (contents)"
+	elif [ ${command} = "log" ]; then
+		atoms="s b B"
+	fi &&
+	files="subject body message" &&
+	while  [ -n "${atoms}" ]; do
+		set ${atoms} && atom=$1 && shift && atoms="$*" &&
+		set ${files} &&	file=$1 && shift && files="$*" &&
+		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
+			rm -f expect &&
+			for ref in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
+				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
+				printf \"\n\" >>expect
+			done &&
+			git $command_and_args --format=\"%${atom}\" >actual &&
+			test_cmp expect actual
+		"
+	done
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
  2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-03-10  2:24   ` Philippe Blain via GitGitGadget
  2020-03-10 17:50     ` Junio C Hamano
  2020-03-10  2:24   ` [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-10  2:24 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter API does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
`--verbatim` option of `git commit` and `git tag`, or by using `git
commit-tree` directly.

This impacts the output `git branch`, `git tag` and `git for-each-ref`
when used with a `--format` argument containing the atoms
`%(contents:subject)` or `%(contents:body)`, as well as the output of
`git branch --verbose`, which uses `%(contents:subject)` internally.

The function find_subpos in ref-filter.c looks for two consecutive '\n'
to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`)  being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is
untouched, resulting in the escape sequence '^M' being output verbatim
in most terminal emulators:

    $ git branch --verbose
    * crlf    2113b0e Subject first line^M ^M Body first line^M Body second line

This bug is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Fix this bug in ref-filter by hardening the logic in `copy_subject` and
`find_subpos` to correctly parse messages containing CRFL.

Add tests for `branch`, `tag` and `for-each-ref` using
lib-crlf-messages.sh.

The 'make commits' test at the beginning of t3203-branch-output.sh needs
to be modified since it did not use `test_tick` and thus the commit
hashes were not reproducible. For simplicity, use `test_commit` as the
content and name of the files created in this setup test are irrelevant
to the rest of the test script.

`test_cleanup_crlf_refs` is used in t3203-branch-output.sh and
t7004-tag.sh to avoid having to modify the expected output in later
tests.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             | 19 +++++++++++++++++--
 t/t3203-branch-output.sh | 26 +++++++++++++++++++++-----
 t/t6300-for-each-ref.sh  |  5 +++++
 t/t7004-tag.sh           |  7 +++++++
 4 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 79bb5206783..537cc4de42c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len)
 {
 	char *r = xmemdupz(buf, len);
 	int i;
+	struct strbuf sb = STRBUF_INIT;
 
-	for (i = 0; i < len; i++)
+	strbuf_attach(&sb, r, len, len + 1);
+	for (i = 0; i < sb.len; i++) {
 		if (r[i] == '\n')
 			r[i] = ' ';
+		if (r[i] == '\r') {
+			strbuf_remove(&sb, i, 1);
+			i -= 1;
+		}
+	}
+	strbuf_detach(&sb, NULL);
 
 	return r;
 }
@@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf,
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
+		/*  protect against messages that might contain \r\n */
+		if (*eol == '\r')
+			eol++;
 		buf = eol;
 	}
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
 	if (*sublen && (*sub)[*sublen - 1] == '\n')
 		*sublen -= 1;
+	/*  protect against commit messages that might contain \r\n */
+	else if (*sublen && (*sub)[*sublen - 1] == '\r')
+		*sublen -= 3; /* drop '\r\n\r' */
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	/* and protect against commit messages that might contain \r\n */
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 71818b90f00..1235089619c 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -3,13 +3,11 @@
 test_description='git branch display tests'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 test_expect_success 'make commits' '
-	echo content >file &&
-	git add file &&
-	git commit -m one &&
-	echo content >>file &&
-	git commit -a -m two
+	test_commit one &&
+	test_commit two
 '
 
 test_expect_success 'make branches' '
@@ -95,6 +93,24 @@ test_expect_success 'git branch --ignore-case --list -v pattern shows branch sum
 	awk "{print \$NF}" <tmp >actual &&
 	test_cmp expect actual
 '
+test_create_crlf_refs
+
+test_expect_success 'git branch -v works with CRLF input' '
+	cat >expect <<-EOF &&
+	  branch-one                     139b20d two
+	  branch-two                     d79ce16 one
+	  crlf                           2113b0e Subject first line
+	  crlf-empty-lines-after-subject 0a9530d Subject first line
+	  crlf-two-line-subject          f9ded1f Subject first line Subject second line
+	* master                         139b20d two
+	EOF
+	git branch -v >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_cleanup_crlf_refs
 
 test_expect_success 'git branch -v pattern does not show branch summaries' '
 	test_must_fail git branch -v branch*
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9c910ce7467..467924de3df 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -8,6 +8,7 @@ test_description='for-each-ref test'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # Mon Jul 3 23:18:43 2006 +0000
 datestamp=1151968723
@@ -888,4 +889,8 @@ test_expect_success 'for-each-ref --ignore-case ignores case' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 6db92bd3ba6..25f465f34ea 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -10,6 +10,7 @@ Tests for operations with tags.'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # creating and listing lightweight tags:
 
@@ -1969,6 +1970,12 @@ test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_cleanup_crlf_refs
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
  2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
  2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
@ 2020-03-10  2:24   ` Philippe Blain via GitGitGadget
  2020-03-10  3:31   ` [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-03-10  2:24 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A previous commit fixed a bug in the ref-filter API causing messages
containing CRLF to be incorrectly parsed and displayed.

Add a test to also check that `git log` correctly handles such messages,
to prevent futur regressions.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/t4202-log.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 4694b6d0ce7..1e6149372c5 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -6,6 +6,7 @@ test_description='git log'
 . "$TEST_DIRECTORY/lib-gpg.sh"
 . "$TEST_DIRECTORY/lib-terminal.sh"
 . "$TEST_DIRECTORY/lib-log-graph.sh"
+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
 
 test_cmp_graph () {
 	lib_test_cmp_graph --format=%s "$@"
@@ -105,6 +106,29 @@ test_expect_success 'oneline' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_expect_success 'oneline with CRLF messages' '
+	echo "002093e Subject first line" >expect &&
+	git log --oneline -1 crlf >actual-branch &&
+	git log --oneline -1 tag-crlf >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag &&
+	echo "6f814b0 Subject first line" >expect &&
+	git log --oneline -1 crlf-empty-lines-after-subject >actual-branch &&
+	git log --oneline -1 tag-crlf-empty-lines-after-subject >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag &&
+	echo "8c58a85 Subject first line Subject second line" >expect &&
+	git log --oneline -1 crlf-two-line-subject >actual-branch &&
+	git log --oneline -1 tag-crlf-two-line-subject >actual-tag &&
+	test_cmp expect actual-branch &&
+	test_cmp expect actual-tag
+'
+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
+
+test_cleanup_crlf_refs
+
 test_expect_success 'diff-filter=A' '
 
 	git log --no-renames --pretty="format:%s" --diff-filter=A HEAD > actual &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-03-10  2:24   ` [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
@ 2020-03-10  3:31   ` Junio C Hamano
  2020-03-10 17:24   ` Junio C Hamano
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  5 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2020-03-10  3:31 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The function find_subpos in ref-filter.c looks for two consecutive '\n' to
> find the end of the subject line, a sequence which is absent in messages
> using CRLF.

This is crucial, and makes it obvious why a change is needed.
Thanks for clearly writing it out.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-03-10  3:31   ` [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
@ 2020-03-10 17:24   ` Junio C Hamano
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  5 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2020-03-10 17:24 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The function find_subpos in ref-filter.c looks for two consecutive '\n' to
> find the end of the subject line, a sequence which is absent in messages
> using CRLF. This results in the whole message being parsed as the subject
> line (%(contents:subject)), and the body of the message (%(contents:body))
> being empty.

To be honest, I suspect that it is not a bug in the parser that
parsed out %(contents:subject), but a user error that left the log
message in CRLF endings ;-).

So "correctly handle CRLF" is probably a tad unfair to those who
wrote the current ref-filter code; a description that is more fair
to them is probably along the lines of "handle malformed log
messages more gracefully", I would think.

> Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched,
> resulting in the escape sequence '^M' being output verbatim in most terminal
> emulators:
>
> $ git branch --verbose
> * crlf    2113b0e Subject first line^M ^M Body first line^M Body second line
>
> This bug is a regression for git branch --verbose, which bisects down to
> 949af06 (branch: use ref-filter printing APIs, 2017-01-10).

I am not sure where you want to go with this.  Whether it is shown
in the ^X notation (and some terminals even reverse color to
highlight them), or it is shown literally (i.e. causing the next
byte to overwrite the same line starting from the left-edge), you
would be annoyed either way, no?  I suspect that the latter would
annoy you even more.  Isn't what "most terminal emulators" do,
i.e. to show it in the ^X notation instead of emitting it literally,
a good thing?  IOW, "resulting in ..." is not correctly telling us
what you think is wrong---you don't have to blame terminals.

It is not limited to CR, and is not limited to control characters at
the end of the lines, no?  If you had "\a" (or "\r") in the middle
of the title, either the current or the old code would ring a bell
(or cause the next character to appear at the end of the same line)
or when piped to "less" you'd see "^G" (or "^M") in the liddle of
the line.

The old code used pretty.c::pretty_print_commit() mechanism;
pretty.c::format_subject() uses pretty.c::is_blank_line() to trim
whitespaces at the right end while trying to notice where the first
paragraph break is, so any whitespace at the end of first paragraph
break is removed, and each end of line got replaced by a SP, but it
did not do anything special to control characters in the middle of
the lines (and it didn't do anything to the control characters in
the middle of the line, either).  So while the old code happened to
cleanse CR at the end of the lines, it wasn't doing enough.

I think fixing _that_ is (and should be) outside the scope of this
series, of course.

>  2:  c68bc2b3788 ! 2:  aab1f45ba97 ref-filter: teach the API to correctly handle CRLF
>      @@ -1,26 +1,49 @@
>       Author: Philippe Blain <levraiphilippeblain@gmail.com>
>       
>      -    ref-filter: teach the API to correctly handle CRLF
>      +    ref-filter: fix the API to correctly handle CRLF

API is not changed (i.e. the callers do not have to do anything
special); only the implementation.

	ref-filter: handle CR at the end of the lines more gracefully

perhaps?

>           The ref-filter API does not correctly handle commit or tag messages that
>           use CRLF as the line terminator. Such messages can be created with the
>           `--verbatim` option of `git commit` and `git tag`, or by using `git
>           commit-tree` directly.
>       
>      +    This impacts the output `git branch`, `git tag` and `git for-each-ref`
>      +    when used with a `--format` argument containing the atoms
>      +    `%(contents:subject)` or `%(contents:body)`, as well as the output of
>      +    `git branch --verbose`, which uses `%(contents:subject)` internally.

In other words...

	When a commit or a tag object uses CRLF line endings, the
	ref-filter machinery does not identify the end of the first
	paragraph as intended by the writer, because it only looks
	for two consecutive LFs and CR-LF-CR-LF does not look like a
	blank line that separates paragraphs to it.  "git branch",
	"git tag" and "git for-each-ref" all rely on the messages
	split correctly into "%(contents:subject)" and
	"%(contents:body)" placeholders and ends up showing
	everything as the subject.

Now based on what I hinted in the far-above part, there can be two
valid solutions here.

 * recognize CRLF as a valid line ending, but still retain ^M in the
   message.  The replacement for "%(contents:subject)" would still
   end with "^M", and we add LF to it, which makes the resulting
   output end with CRLF and all is well.  This will keep "\a" and
   "\r" in the middle of the line in the output.

 * strip CR and any control character other than LF from everywhere.
   This will cleanse "\a" and "\r" in the middle of, or anywhere on,
   the line, so that "%(contents:subject)", "%(contents:body)" and
   "%(contents)" all are "clean".

I am not offhand sure which one is better (I haven't read the patch
to see which one you chose to implement).

>      +    The function find_subpos in ref-filter.c looks for two consecutive '\n'
>      +    to find the end of the subject line, a sequence which is absent in
>      +    messages using CRLF. This results in the whole message being parsed as
>      +    the subject line (`%(contents:subject)`), and the body of the message
>      +    (`%(contents:body)`)  being empty.



>      +    Moreover, in copy_subject, '\n' is replaced by space, but '\r' is
>      +    untouched, resulting in the escape sequence '^M' being output verbatim
>      +    in most terminal emulators:
>      ...
>      +    This bug is a regression for `git branch --verbose`, which
>      +    bisects down to 949af0684c (branch: use ref-filter printing APIs,
>      +    2017-01-10).
>      +
>      +    Fix this bug in ref-filter by hardening the logic in `copy_subject` and
>      +    `find_subpos` to correctly parse messages containing CRFL.

The above few lines may need revising (based on what I said to the
cover); --- even if they don't, CRFL here needs to become CRLF ;-)

Thanks for working on this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF
  2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
@ 2020-03-10 17:50     ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2020-03-10 17:50 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Philippe Blain <levraiphilippeblain@gmail.com>
>
> The ref-filter API does not correctly handle commit or tag messages that
> ...

(I won't repeat myself here; see 0/3)

> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
> ---
>  ref-filter.c             | 19 +++++++++++++++++--
>  t/t3203-branch-output.sh | 26 +++++++++++++++++++++-----
>  t/t6300-for-each-ref.sh  |  5 +++++
>  t/t7004-tag.sh           |  7 +++++++
>  4 files changed, 50 insertions(+), 7 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 79bb5206783..537cc4de42c 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len)
>  {
>  	char *r = xmemdupz(buf, len);
>  	int i;
> +	struct strbuf sb = STRBUF_INIT;
>  
> -	for (i = 0; i < len; i++)
> +	strbuf_attach(&sb, r, len, len + 1);
> +	for (i = 0; i < sb.len; i++) {
>  		if (r[i] == '\n')
>  			r[i] = ' ';
> +		if (r[i] == '\r') {
> +			strbuf_remove(&sb, i, 1);
> +			i -= 1;
> +		}
> +	}
> +	strbuf_detach(&sb, NULL);
>  	return r;
>  }

So, the chosen solution is to only remove CR and do so only in the
first paragraph.  Even if the second and subsequent paragraphs use
CRLF line endings, those CRs are retained.  Also, a lone CR in the
first paragraph that is not part of CRLF end-of-line marker is lost,
but other control characters like "\a" are retained.  That sounds
like an almost "minimum" change but not quite.

The way strbuf is used in the implementation is a bit curious and
risky.  Currently we do not realloc to shrink a strbuf, but when we
start doing so, this code would break because you are relying on the
fact that just before calling strbuf_detach(), sb.buf happens to be
still the same as r.

As the point of the function is "we want to return a copy of what is
in buf[0..len] but the input is a (possibly multi-line) paragraph,
and we want a single line 'title', so replace end-of-line with a
SP", a minimal translation that is more faithful to the intended
meaning of the function would be:

	static char *copy_subject(const char *buf, unsigned long len)
	{
		struct strbuf sb = STRBUF_INIT;

		for (i = 0; i < len; i++) {
                	if (buf[i] == '\r' &&
			    i + 1 < len && buf[i + 1] == '\n')
                            continue; /* ignore CR in CRLF */

			if (buf[i] == '\n')
				strbuf_addch(&sb, ' ');
			else
				strbuf_addch(&sb, buf[i]);
		}
                return strbuf_detach(&sb, NULL);
	}

perhaps?  This retains CR in the middle if exists just like BEL in
the middle of the line, and uses strbuf in a safe way, I think.

> @@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf,
>  		eol = strchrnul(buf, '\n');
>  		if (*eol)
>  			eol++;
> +		/*  protect against messages that might contain \r\n */
> +		if (*eol == '\r')
> +			eol++;

This is quite convoluted.  You found a LF and then are hoping to see
if the byte after LF is CR (i.e. you are looking for LFCR, not
CRLF).

>  		buf = eol;
>  	}
>  	*sublen = buf - *sub;
>  	/* drop trailing newline, if present */
>  	if (*sublen && (*sub)[*sublen - 1] == '\n')
>  		*sublen -= 1;
> +	/*  protect against commit messages that might contain \r\n */
> +	else if (*sublen && (*sub)[*sublen - 1] == '\r')
> +		*sublen -= 3; /* drop '\r\n\r' */

Yeek.  To find CR-LF-CR-LF, you look for CR-LF-CR?  You only checked
that the previous byte is NOT LF (because you are in else-if, so the
previous if must have failed) and you have at least one previous byte
that is CR.  What gives us OK that *sublen is sufficiently long that
we can safely subtract 3 from it (we only checked that it is not 0;
who says it is 3 or more in this code???) and the two bytes before
the CR we are looking at here are CRLF???

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully
  2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-03-10 17:24   ` Junio C Hamano
@ 2020-10-12 18:09   ` Philippe Blain via GitGitGadget
  2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
                       ` (3 more replies)
  5 siblings, 4 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-12 18:09 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

Changes since v2:

 * in response to Junio's comments, improved the logic in 'find_subpos'. I
   tried to also simplify the code by using 'str*' functions, to make it
   more obvious what the code is actually looking for in the message. 
 * took Junio's suggestion for 'copy_subject'
 * in response to Junio's comments, tweaked the commit message for patch 2/3 
 * refactored 'lib-crlf-messages.sh' to reduce duplication (patch 1/3)
 * added tests for 'git show' in patch 3/3 
 * removed commit hashes in tests to make them hash-agnostic


----------------------------------------------------------------------------

The ref-filter code does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
--verbatim option of git commit and git tag, or by using git commit-tree 
directly.

The function find_subpos in ref-filter.c looks for two consecutive '\n' to
find the end of the subject line, a sequence which is absent in messages
using CRLF. This results in the whole message being parsed as the subject
line (%(contents:subject)), and the body of the message (%(contents:body))
being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched.

This bug is a regression for git branch --verbose, which bisects down to
949af06 (branch: use ref-filter printing APIs, 2017-01-10).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log 
andgit show in the presence of CRLF in messages, to prevent futur
regressions.


----------------------------------------------------------------------------

v2:

The ref-filter API does not correctly handle commit or tag messages that use
CRLF as the line terminator. Such messages can be created with the--verbatim 
option of git commit and git tag, or by using git commit-tree directly.

This impacts the output git branch, git tag and git for-each-refwhen used
with a --format argument containing the atoms%(contents:subject) or 
%(contents:body), as well as the output ofgit branch --verbose, which uses 
%(contents:subject) internally.

The function find_subpos in ref-filter.c looks for two consecutive '\n' to
find the end of the subject line, a sequence which is absent in messages
using CRLF. This results in the whole message being parsed as the subject
line (%(contents:subject)), and the body of the message (%(contents:body))
being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched,
resulting in the escape sequence '^M' being output verbatim in most terminal
emulators:

$ git branch --verbose
* crlf    2113b0e Subject first line^M ^M Body first line^M Body second line

This bug is a regression for git branch --verbose, which bisects down to
949af06 (branch: use ref-filter printing APIs, 2017-01-10).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log in
the presence of CRLF in messages, to prevent futur regressions.

changes since v1:

 * improved the cover letter and the commit message of the 2nd patch to
   actually describe the bug this series is fixing


----------------------------------------------------------------------------

v1:

The ref-filter API does not correctly handle commit or tag messages that use
CRLF as the line terminator. Such messages can be created with the--verbatim 
option of git commit and git tag, or by using git commit-tree directly.

This impacts the output of git branch -v, and git branch, git tag and git
for-each-ref when used with a --format argument containing the atoms 
%(contents:subject) or %(contents:body).

The first patch in this series adds a new test library, 
t/lib-crlf-messages.sh, that contains functions to test this behaviour for
commands using either the ref-filter or the pretty API to display messages. 

The second patch fixes the bug in ref-filter.c and adds corresponding tests.

Finally, the third patch adds a test that checks the behaviour of git log in
the presence of CRLF in messages, to prevent futur regressions.

Philippe Blain (3):
  t: add lib-crlf-messages.sh for messages containing CRLF
  ref-filter: handle CRLF at end-of-line more gracefully
  log, show: add tests for messages containing CRLF

 ref-filter.c             | 36 +++++++++-------
 t/lib-crlf-messages.sh   | 90 ++++++++++++++++++++++++++++++++++++++++
 t/t3203-branch-output.sh | 31 +++++++++++---
 t/t4202-log.sh           | 18 ++++++++
 t/t6300-for-each-ref.sh  |  5 +++
 t/t7004-tag.sh           |  7 ++++
 t/t7007-show.sh          |  5 +++
 7 files changed, 172 insertions(+), 20 deletions(-)
 create mode 100644 t/lib-crlf-messages.sh


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-576%2Fphil-blain%2Ffix-branch-verbose-crlf-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-576/phil-blain/fix-branch-verbose-crlf-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/576

Range-diff vs v2:

 1:  b252113803 ! 1:  f17d182c3b t: add lib-crlf-messages.sh for messages containing CRLF
     @@ Commit message
          commit messages, so that we can easily test that this bug does not
          appear in other commands in the future.
      
     -    The function test_crlf_subject_body_and_contents can be used to test
     +    The function `test_crlf_subject_body_and_contents` can be used to test
          that the `--format` option of `branch`, `tag`, `for-each-ref` and
     -    `log` correctly displays the subject, body and raw body of commits and
     +    `log` correctly displays the subject, body and raw content of commits and
          tag messages.
      
          The commits are created using `commit-tree` such that the current branch
          in the test repository is not affected when `test_create_crlf_refs` is
     -    called in a test. This is to done so that the CRLF tests can be inserted
     -    anywhere in a test script where it makes sense to do so, without having
     -    to potentially modify further test that depend on output that would be
     +    called in a test. This is done so that the CRLF tests can be inserted
     +    anywhere in a test script, where it makes sense to do so, without having
     +    to potentially modify further tests that depend on output that would be
          modified if the current branch gained new commits.
      
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
     @@ t/lib-crlf-messages.sh (new)
      @@
      +# Setup refs with commit and tag messages containing CRLF
      +
     ++LIB_CRLF_BRANCHES=""
     ++
      +create_crlf_ref () {
      +	message="$1" &&
      +	subject="$2" &&
     @@ t/lib-crlf-messages.sh (new)
      +	printf "${message}" >.crlf-message-${branch}.txt &&
      +	printf "${subject}" >.crlf-subject-${branch}.txt &&
      +	printf "${body}" >.crlf-body-${branch}.txt &&
     -+    test_tick &&
     ++	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"
     ++	test_tick &&
      +	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
      +	git branch ${branch} ${hash} &&
      +	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
     @@ t/lib-crlf-messages.sh (new)
      +	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
      +	body="Body first line\r\nBody second line\r\n" &&
      +	subject="Subject first line" &&
     -+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf" &&
     ++	branch="crlf" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
      +	message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
     -+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-empty-lines-after-subject" &&
     ++	branch="crlf-empty-lines-after-subject" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
      +	message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
      +	subject="Subject first line Subject second line" &&
     -+	create_crlf_ref "${message}" "${subject}" "${body}" "crlf-two-line-subject"
     ++	branch="crlf-two-line-subject" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
     ++	message="Subject first line\r\nSubject second line" &&
     ++	subject="Subject first line Subject second line" &&
     ++	body="" &&
     ++	branch="crlf-two-line-subject-no-body" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
     ++	message="Subject first line\r\nSubject second line\r\n" &&
     ++	branch="crlf-two-line-subject-no-body-trailing-newline" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
     ++	message="Subject first line\r\nSubject second line\r\n\r" &&
     ++	branch="crlf-two-line-subject-no-body-trailing-newline2" &&
     ++	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}"
      +}
      +
      +test_create_crlf_refs () {
     @@ t/lib-crlf-messages.sh (new)
      +}
      +
      +cleanup_crlf_refs () {
     -+	for branch in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
     ++	for branch in ${LIB_CRLF_BRANCHES}; do
      +		git branch -D ${branch} &&
      +		git tag -d tag-${branch} &&
      +		rm .crlf-message-${branch}.txt &&
     @@ t/lib-crlf-messages.sh (new)
      +}
      +
      +test_cleanup_crlf_refs () {
     -+	test_expect_success 'clenup refs with CRLF commit messages' '
     ++	test_expect_success 'cleanup refs with CRLF commit messages' '
      +		cleanup_crlf_refs
      +	'
      +}
     @@ t/lib-crlf-messages.sh (new)
      +	command=$1 &&
      +	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
      +		atoms="(contents:subject) (contents:body) (contents)"
     -+	elif [ ${command} = "log" ]; then
     ++	elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
      +		atoms="s b B"
      +	fi &&
      +	files="subject body message" &&
     @@ t/lib-crlf-messages.sh (new)
      +		set ${files} &&	file=$1 && shift && files="$*" &&
      +		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
      +			rm -f expect &&
     -+			for ref in crlf crlf-empty-lines-after-subject crlf-two-line-subject; do
     ++			for ref in ${LIB_CRLF_BRANCHES}; do
      +				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
      +				printf \"\n\" >>expect
      +			done &&
 2:  aab1f45ba9 ! 2:  11d044a4f7 ref-filter: fix the API to correctly handle CRLF
     @@ Metadata
      Author: Philippe Blain <levraiphilippeblain@gmail.com>
      
       ## Commit message ##
     -    ref-filter: fix the API to correctly handle CRLF
     +    ref-filter: handle CRLF at end-of-line more gracefully
      
     -    The ref-filter API does not correctly handle commit or tag messages that
     +    The ref-filter code does not correctly handle commit or tag messages that
          use CRLF as the line terminator. Such messages can be created with the
          `--verbatim` option of `git commit` and `git tag`, or by using `git
          commit-tree` directly.
      
     -    This impacts the output `git branch`, `git tag` and `git for-each-ref`
     -    when used with a `--format` argument containing the atoms
     -    `%(contents:subject)` or `%(contents:body)`, as well as the output of
     -    `git branch --verbose`, which uses `%(contents:subject)` internally.
     -
     -    The function find_subpos in ref-filter.c looks for two consecutive '\n'
     -    to find the end of the subject line, a sequence which is absent in
     +    The function `find_subpos` in ref-filter.c looks for two consecutive
     +    LFs to find the end of the subject line, a sequence which is absent in
          messages using CRLF. This results in the whole message being parsed as
          the subject line (`%(contents:subject)`), and the body of the message
          (`%(contents:body)`)  being empty.
      
     -    Moreover, in copy_subject, '\n' is replaced by space, but '\r' is
     -    untouched, resulting in the escape sequence '^M' being output verbatim
     -    in most terminal emulators:
     +    Moreover, in `copy_subject`, which wants to return the subject as a
     +    single line, '\n' is replaced by space, but '\r' is
     +    untouched.
      
     -        $ git branch --verbose
     -        * crlf    2113b0e Subject first line^M ^M Body first line^M Body second line
     +    This impacts the output of `git branch`, `git tag` and `git
     +    for-each-ref`.
      
          This bug is a regression for `git branch --verbose`, which
          bisects down to 949af0684c (branch: use ref-filter printing APIs,
          2017-01-10).
      
          Fix this bug in ref-filter by hardening the logic in `copy_subject` and
     -    `find_subpos` to correctly parse messages containing CRFL.
     +    `find_subpos` to correctly parse messages containing CRLF.
      
          Add tests for `branch`, `tag` and `for-each-ref` using
          lib-crlf-messages.sh.
      
     -    The 'make commits' test at the beginning of t3203-branch-output.sh needs
     -    to be modified since it did not use `test_tick` and thus the commit
     -    hashes were not reproducible. For simplicity, use `test_commit` as the
     -    content and name of the files created in this setup test are irrelevant
     -    to the rest of the test script.
     +    The 'make commits' test at the beginning of t3203-branch-output.sh does
     +    not use `test_tick` and thus the commit hashes are not reproducible. For
     +    simplicity, use `test_commit` to create the commits, as the content and
     +    name of the files created in this setup test are irrelevant to the rest
     +    of the test script.
      
     -    `test_cleanup_crlf_refs` is used in t3203-branch-output.sh and
     -    t7004-tag.sh to avoid having to modify the expected output in later
     -    tests.
     +    Use `test_cleanup_crlf_refs` in t3203-branch-output.sh and t7004-tag.sh
     +    to avoid having to modify the expected output in later tests.
      
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
      
       ## ref-filter.c ##
     -@@ ref-filter.c: static char *copy_subject(const char *buf, unsigned long len)
     +@@ ref-filter.c: static const char *copy_email(const char *buf, struct used_atom *atom)
     + 
     + static char *copy_subject(const char *buf, unsigned long len)
       {
     - 	char *r = xmemdupz(buf, len);
     - 	int i;
     +-	char *r = xmemdupz(buf, len);
     +-	int i;
      +	struct strbuf sb = STRBUF_INIT;
       
      -	for (i = 0; i < len; i++)
     -+	strbuf_attach(&sb, r, len, len + 1);
     -+	for (i = 0; i < sb.len; i++) {
     - 		if (r[i] == '\n')
     - 			r[i] = ' ';
     -+		if (r[i] == '\r') {
     -+			strbuf_remove(&sb, i, 1);
     -+			i -= 1;
     -+		}
     -+	}
     -+	strbuf_detach(&sb, NULL);
     +-		if (r[i] == '\n')
     +-			r[i] = ' ';
     ++	for (int i = 0; i < len; i++) {
     ++		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
     ++			continue; /* ignore CR in CRLF */
       
     - 	return r;
     +-	return r;
     ++		if (buf[i] == '\n')
     ++			strbuf_addch(&sb, ' ');
     ++		else
     ++			strbuf_addch(&sb, buf[i]);
     ++	}
     ++	return strbuf_detach(&sb, NULL);
       }
     + 
     + static void grab_date(const char *buf, struct atom_value *v, const char *atomname)
      @@ ref-filter.c: static void find_subpos(const char *buf,
     - 		eol = strchrnul(buf, '\n');
     - 		if (*eol)
     - 			eol++;
     -+		/*  protect against messages that might contain \r\n */
     -+		if (*eol == '\r')
     -+			eol++;
     - 		buf = eol;
     - 	}
     + 
     + 	/* subject is first non-empty line */
     + 	*sub = buf;
     +-	/* subject goes to first empty line */
     +-	while (buf < *sig && *buf && *buf != '\n') {
     +-		eol = strchrnul(buf, '\n');
     +-		if (*eol)
     +-			eol++;
     +-		buf = eol;
     +-	}
     ++	/* subject goes to first empty line before signature begins */
     ++	if ((eol = strstr(*sub, "\n\n"))) {
     ++		eol = eol < *sig ? eol : *sig;
     ++	/* check if message uses CRLF */
     ++	} else if (! (eol = strstr(*sub, "\r\n\r\n"))) {
     ++		/* treat whole message as subject */
     ++		eol = strrchr(*sub, '\0');
     ++	}
     ++	buf = eol;
       	*sublen = buf - *sub;
       	/* drop trailing newline, if present */
     - 	if (*sublen && (*sub)[*sublen - 1] == '\n')
     +-	if (*sublen && (*sub)[*sublen - 1] == '\n')
     ++	while (*sublen && ((*sub)[*sublen - 1] == '\n' || (*sub)[*sublen - 1] == '\r'))
       		*sublen -= 1;
     -+	/*  protect against commit messages that might contain \r\n */
     -+	else if (*sublen && (*sub)[*sublen - 1] == '\r')
     -+		*sublen -= 3; /* drop '\r\n\r' */
       
       	/* skip any empty lines */
      -	while (*buf == '\n')
     -+	/* and protect against commit messages that might contain \r\n */
      +	while (*buf == '\n' || *buf == '\r')
       		buf++;
       	*body = buf;
     @@ t/t3203-branch-output.sh: test_expect_success 'git branch --ignore-case --list -
      +
      +test_expect_success 'git branch -v works with CRLF input' '
      +	cat >expect <<-EOF &&
     -+	  branch-one                     139b20d two
     -+	  branch-two                     d79ce16 one
     -+	  crlf                           2113b0e Subject first line
     -+	  crlf-empty-lines-after-subject 0a9530d Subject first line
     -+	  crlf-two-line-subject          f9ded1f Subject first line Subject second line
     -+	* master                         139b20d two
     ++	  two
     ++	  one
     ++	  Subject first line
     ++	  Subject first line
     ++	  Subject first line Subject second line
     ++	  Subject first line Subject second line
     ++	  Subject first line Subject second line
     ++	  Subject first line Subject second line
      +	EOF
     -+	git branch -v >actual &&
     ++	git branch -v >tmp &&
     ++	# Remove first two columns, and the line for the currently checked out branch
     ++	current=$(git branch --show-current) &&
     ++	grep -v $current <tmp | awk "{\$1=\$2=\"\"}1"  >actual &&
      +	test_cmp expect actual
      +'
      +
     @@ t/t6300-for-each-ref.sh: test_description='for-each-ref test'
       
       # Mon Jul 3 23:18:43 2006 +0000
       datestamp=1151968723
     -@@ t/t6300-for-each-ref.sh: test_expect_success 'for-each-ref --ignore-case ignores case' '
     +@@ t/t6300-for-each-ref.sh: test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
       	test_cmp expect actual
       '
       
 3:  c84092e457 ! 3:  59957d1391 log: add tests for messages containing CRLF to t4202
     @@ Metadata
      Author: Philippe Blain <levraiphilippeblain@gmail.com>
      
       ## Commit message ##
     -    log: add tests for messages containing CRLF to t4202
     +    log, show: add tests for messages containing CRLF
      
     -    A previous commit fixed a bug in the ref-filter API causing messages
     +    A previous commit fixed a bug in ref-filter.c causing messages
          containing CRLF to be incorrectly parsed and displayed.
      
     -    Add a test to also check that `git log` correctly handles such messages,
     -    to prevent futur regressions.
     +    Add tests to also check that `git log` and `git show` correctly handle
     +    such messages, to prevent futur regressions if these commands are
     +    refactored to use the ref-filter API.
     +
     +    To prevent having to modify expected output in further tests, use
     +    'test_cleanup_crlf_refs' in t4202 to clean-up after the added tests.
      
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
      
     @@ t/t4202-log.sh: test_expect_success 'oneline' '
      +test_create_crlf_refs
      +
      +test_expect_success 'oneline with CRLF messages' '
     -+	echo "002093e Subject first line" >expect &&
     -+	git log --oneline -1 crlf >actual-branch &&
     -+	git log --oneline -1 tag-crlf >actual-tag &&
     -+	test_cmp expect actual-branch &&
     -+	test_cmp expect actual-tag &&
     -+	echo "6f814b0 Subject first line" >expect &&
     -+	git log --oneline -1 crlf-empty-lines-after-subject >actual-branch &&
     -+	git log --oneline -1 tag-crlf-empty-lines-after-subject >actual-tag &&
     -+	test_cmp expect actual-branch &&
     -+	test_cmp expect actual-tag &&
     -+	echo "8c58a85 Subject first line Subject second line" >expect &&
     -+	git log --oneline -1 crlf-two-line-subject >actual-branch &&
     -+	git log --oneline -1 tag-crlf-two-line-subject >actual-tag &&
     -+	test_cmp expect actual-branch &&
     -+	test_cmp expect actual-tag
     ++	for branch in $LIB_CLRF_BRANCHES; do
     ++		cat .crlf-subject-${branch}.txt >expect &&
     ++		git log --oneline -1 ${branch} >tmp-branch &&
     ++		git log --oneline -1 tag-${branch} >tmp-tag &&
     ++		awk "{print \$NF}" <tmp-branch >actual-branch &&
     ++		awk "{print \$NF}" <tmp-tag >actual-tag &&
     ++		test_cmp expect actual-branch &&
     ++		test_cmp expect actual-tag
     ++	done
      +'
      +test_crlf_subject_body_and_contents log --all --reverse --grep Subject
      +
     @@ t/t4202-log.sh: test_expect_success 'oneline' '
       test_expect_success 'diff-filter=A' '
       
       	git log --no-renames --pretty="format:%s" --diff-filter=A HEAD > actual &&
     +
     + ## t/t7007-show.sh ##
     +@@
     + test_description='git show'
     + 
     + . ./test-lib.sh
     ++. "$TEST_DIRECTORY/lib-crlf-messages.sh"
     + 
     + test_expect_success setup '
     + 	echo hello world >foo &&
     +@@ t/t7007-show.sh: test_expect_success 'show --graph is forbidden' '
     +   test_must_fail git show --graph HEAD
     + '
     + 
     ++test_create_crlf_refs
     ++
     ++test_crlf_subject_body_and_contents show $LIB_CRLF_BRANCHES
     ++
     + test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
@ 2020-10-12 18:09     ` Philippe Blain via GitGitGadget
  2020-10-12 22:22       ` Junio C Hamano
  2020-10-12 22:47       ` Eric Sunshine
  2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-12 18:09 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A following commit will fix a bug in the ref-filter API that causes
commit and tag messages containing CRLF to be incorrectly parsed and
displayed.

Add a test library (t/lib-crlf-messages.sh) that creates refs with such
commit messages, so that we can easily test that this bug does not
appear in other commands in the future.

The function `test_crlf_subject_body_and_contents` can be used to test
that the `--format` option of `branch`, `tag`, `for-each-ref` and
`log` correctly displays the subject, body and raw content of commits and
tag messages.

The commits are created using `commit-tree` such that the current branch
in the test repository is not affected when `test_create_crlf_refs` is
called in a test. This is done so that the CRLF tests can be inserted
anywhere in a test script, where it makes sense to do so, without having
to potentially modify further tests that depend on output that would be
modified if the current branch gained new commits.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/lib-crlf-messages.sh | 90 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 t/lib-crlf-messages.sh

diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
new file mode 100644
index 0000000000..10a2b57280
--- /dev/null
+++ b/t/lib-crlf-messages.sh
@@ -0,0 +1,90 @@
+# Setup refs with commit and tag messages containing CRLF
+
+LIB_CRLF_BRANCHES=""
+
+create_crlf_ref () {
+	message="$1" &&
+	subject="$2" &&
+	body="$3" &&
+	branch="$4" &&
+	printf "${message}" >.crlf-message-${branch}.txt &&
+	printf "${subject}" >.crlf-subject-${branch}.txt &&
+	printf "${body}" >.crlf-body-${branch}.txt &&
+	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"
+	test_tick &&
+	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
+	git branch ${branch} ${hash} &&
+	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
+}
+
+create_crlf_refs () {
+	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	body="Body first line\r\nBody second line\r\n" &&
+	subject="Subject first line" &&
+	branch="crlf" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
+	message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	branch="crlf-empty-lines-after-subject" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
+	message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
+	subject="Subject first line Subject second line" &&
+	branch="crlf-two-line-subject" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
+	message="Subject first line\r\nSubject second line" &&
+	subject="Subject first line Subject second line" &&
+	body="" &&
+	branch="crlf-two-line-subject-no-body" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
+	message="Subject first line\r\nSubject second line\r\n" &&
+	branch="crlf-two-line-subject-no-body-trailing-newline" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
+	message="Subject first line\r\nSubject second line\r\n\r" &&
+	branch="crlf-two-line-subject-no-body-trailing-newline2" &&
+	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}"
+}
+
+test_create_crlf_refs () {
+	test_expect_success 'setup refs with CRLF commit messages' '
+		create_crlf_refs
+	'
+}
+
+cleanup_crlf_refs () {
+	for branch in ${LIB_CRLF_BRANCHES}; do
+		git branch -D ${branch} &&
+		git tag -d tag-${branch} &&
+		rm .crlf-message-${branch}.txt &&
+		rm .crlf-subject-${branch}.txt &&
+		rm .crlf-body-${branch}.txt
+	done
+}
+
+test_cleanup_crlf_refs () {
+	test_expect_success 'cleanup refs with CRLF commit messages' '
+		cleanup_crlf_refs
+	'
+}
+
+test_crlf_subject_body_and_contents() {
+	command_and_args="$@" &&
+	command=$1 &&
+	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
+		atoms="(contents:subject) (contents:body) (contents)"
+	elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
+		atoms="s b B"
+	fi &&
+	files="subject body message" &&
+	while  [ -n "${atoms}" ]; do
+		set ${atoms} && atom=$1 && shift && atoms="$*" &&
+		set ${files} &&	file=$1 && shift && files="$*" &&
+		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
+			rm -f expect &&
+			for ref in ${LIB_CRLF_BRANCHES}; do
+				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
+				printf \"\n\" >>expect
+			done &&
+			git $command_and_args --format=\"%${atom}\" >actual &&
+			test_cmp expect actual
+		"
+	done
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-10-12 18:09     ` Philippe Blain via GitGitGadget
  2020-10-12 22:24       ` Junio C Hamano
  2020-10-12 18:09     ` [PATCH v3 3/3] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
  2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  3 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-12 18:09 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter code does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
`--verbatim` option of `git commit` and `git tag`, or by using `git
commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`)  being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This bug is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Fix this bug in ref-filter by hardening the logic in `copy_subject` and
`find_subpos` to correctly parse messages containing CRLF.

Add tests for `branch`, `tag` and `for-each-ref` using
lib-crlf-messages.sh.

The 'make commits' test at the beginning of t3203-branch-output.sh does
not use `test_tick` and thus the commit hashes are not reproducible. For
simplicity, use `test_commit` to create the commits, as the content and
name of the files created in this setup test are irrelevant to the rest
of the test script.

Use `test_cleanup_crlf_refs` in t3203-branch-output.sh and t7004-tag.sh
to avoid having to modify the expected output in later tests.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             | 36 +++++++++++++++++++++---------------
 t/t3203-branch-output.sh | 31 ++++++++++++++++++++++++++-----
 t/t6300-for-each-ref.sh  |  5 +++++
 t/t7004-tag.sh           |  7 +++++++
 4 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..92d8ca5340 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1097,14 +1097,18 @@ static const char *copy_email(const char *buf, struct used_atom *atom)
 
 static char *copy_subject(const char *buf, unsigned long len)
 {
-	char *r = xmemdupz(buf, len);
-	int i;
+	struct strbuf sb = STRBUF_INIT;
 
-	for (i = 0; i < len; i++)
-		if (r[i] == '\n')
-			r[i] = ' ';
+	for (int i = 0; i < len; i++) {
+		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
+			continue; /* ignore CR in CRLF */
 
-	return r;
+		if (buf[i] == '\n')
+			strbuf_addch(&sb, ' ');
+		else
+			strbuf_addch(&sb, buf[i]);
+	}
+	return strbuf_detach(&sb, NULL);
 }
 
 static void grab_date(const char *buf, struct atom_value *v, const char *atomname)
@@ -1228,20 +1232,22 @@ static void find_subpos(const char *buf,
 
 	/* subject is first non-empty line */
 	*sub = buf;
-	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
-		buf = eol;
-	}
+	/* subject goes to first empty line before signature begins */
+	if ((eol = strstr(*sub, "\n\n"))) {
+		eol = eol < *sig ? eol : *sig;
+	/* check if message uses CRLF */
+	} else if (! (eol = strstr(*sub, "\r\n\r\n"))) {
+		/* treat whole message as subject */
+		eol = strrchr(*sub, '\0');
+	}
+	buf = eol;
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	while (*sublen && ((*sub)[*sublen - 1] == '\n' || (*sub)[*sublen - 1] == '\r'))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 71818b90f0..c06eca774f 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -3,13 +3,11 @@
 test_description='git branch display tests'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 test_expect_success 'make commits' '
-	echo content >file &&
-	git add file &&
-	git commit -m one &&
-	echo content >>file &&
-	git commit -a -m two
+	test_commit one &&
+	test_commit two
 '
 
 test_expect_success 'make branches' '
@@ -95,6 +93,29 @@ test_expect_success 'git branch --ignore-case --list -v pattern shows branch sum
 	awk "{print \$NF}" <tmp >actual &&
 	test_cmp expect actual
 '
+test_create_crlf_refs
+
+test_expect_success 'git branch -v works with CRLF input' '
+	cat >expect <<-EOF &&
+	  two
+	  one
+	  Subject first line
+	  Subject first line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	EOF
+	git branch -v >tmp &&
+	# Remove first two columns, and the line for the currently checked out branch
+	current=$(git branch --show-current) &&
+	grep -v $current <tmp | awk "{\$1=\$2=\"\"}1"  >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_cleanup_crlf_refs
 
 test_expect_success 'git branch -v pattern does not show branch summaries' '
 	test_must_fail git branch -v branch*
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index b359023189..c30940cf7a 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -8,6 +8,7 @@ test_description='for-each-ref test'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # Mon Jul 3 23:18:43 2006 +0000
 datestamp=1151968723
@@ -1030,4 +1031,8 @@ test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 05f411c821..cda735dab4 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -10,6 +10,7 @@ Tests for operations with tags.'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # creating and listing lightweight tags:
 
@@ -1970,6 +1971,12 @@ test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_cleanup_crlf_refs
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 3/3] log, show: add tests for messages containing CRLF
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
  2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
@ 2020-10-12 18:09     ` Philippe Blain via GitGitGadget
  2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  3 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-12 18:09 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A previous commit fixed a bug in ref-filter.c causing messages
containing CRLF to be incorrectly parsed and displayed.

Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.

To prevent having to modify expected output in further tests, use
'test_cleanup_crlf_refs' in t4202 to clean-up after the added tests.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/t4202-log.sh  | 18 ++++++++++++++++++
 t/t7007-show.sh |  5 +++++
 2 files changed, 23 insertions(+)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 56d34ed465..d4942a6f92 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -6,6 +6,7 @@ test_description='git log'
 . "$TEST_DIRECTORY/lib-gpg.sh"
 . "$TEST_DIRECTORY/lib-terminal.sh"
 . "$TEST_DIRECTORY/lib-log-graph.sh"
+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
 
 test_cmp_graph () {
 	lib_test_cmp_graph --format=%s "$@"
@@ -105,6 +106,23 @@ test_expect_success 'oneline' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_expect_success 'oneline with CRLF messages' '
+	for branch in $LIB_CLRF_BRANCHES; do
+		cat .crlf-subject-${branch}.txt >expect &&
+		git log --oneline -1 ${branch} >tmp-branch &&
+		git log --oneline -1 tag-${branch} >tmp-tag &&
+		awk "{print \$NF}" <tmp-branch >actual-branch &&
+		awk "{print \$NF}" <tmp-tag >actual-tag &&
+		test_cmp expect actual-branch &&
+		test_cmp expect actual-tag
+	done
+'
+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
+
+test_cleanup_crlf_refs
+
 test_expect_success 'diff-filter=A' '
 
 	git log --no-renames --pretty="format:%s" --diff-filter=A HEAD > actual &&
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index 42d3db6246..5ffe852829 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -3,6 +3,7 @@
 test_description='git show'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
 
 test_expect_success setup '
 	echo hello world >foo &&
@@ -128,4 +129,8 @@ test_expect_success 'show --graph is forbidden' '
   test_must_fail git show --graph HEAD
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents show $LIB_CRLF_BRANCHES
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-10-12 22:22       ` Junio C Hamano
  2020-10-14 13:22         ` Philippe Blain
  2020-10-12 22:47       ` Eric Sunshine
  1 sibling, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-10-12 22:22 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Philippe Blain <levraiphilippeblain@gmail.com>
>
> A following commit will fix a bug in the ref-filter API that causes
> commit and tag messages containing CRLF to be incorrectly parsed and
> displayed.
>
> Add a test library (t/lib-crlf-messages.sh) that creates refs with such
> commit messages, so that we can easily test that this bug does not
> appear in other commands in the future.
> ...
> The function `test_crlf_subject_body_and_contents` can be used to test
> that the `--format` option of `branch`, `tag`, `for-each-ref` and
> `log` correctly displays the subject, body and raw content of commits and
> tag messages.

I am not sure about the wisdom of this arrangement.  Surely you do
not want to write duplicated set-up for (existing) test scripts for
for-each-ref, branch and tag subcommands, assuming that these test
scripts are separated for subcommands they test.

But you can have a single test script, that is differentiated from
all other test scripts by what it tests: having to deal with commits
that use CRLF.  Then we do not have to add dot-includable test
library that lets various tests to create these same funny commits.
Instead, we can just do these as normal set-up step(s) for that
single test scripts, and then in that test scripts, verify what is
shown by various commands that share the underlying ref-filter
machinery.  No?


> diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
> new file mode 100644
> index 0000000000..10a2b57280
> --- /dev/null
> +++ b/t/lib-crlf-messages.sh
> @@ -0,0 +1,90 @@
> +# Setup refs with commit and tag messages containing CRLF
> +
> +LIB_CRLF_BRANCHES=""
> +
> +create_crlf_ref () {
> +	message="$1" &&
> +	subject="$2" &&
> +	body="$3" &&
> +	branch="$4" &&
> +	printf "${message}" >.crlf-message-${branch}.txt &&
> +	printf "${subject}" >.crlf-subject-${branch}.txt &&
> +	printf "${body}" >.crlf-body-${branch}.txt &&
> +	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"
> +	test_tick &&
> +	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
> +	git branch ${branch} ${hash} &&
> +	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
> +}
> +
> +create_crlf_refs () {
> +	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
> +	body="Body first line\r\nBody second line\r\n" &&
> +	subject="Subject first line" &&
> +	branch="crlf" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> +	message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
> +	branch="crlf-empty-lines-after-subject" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> +	message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
> +	subject="Subject first line Subject second line" &&
> +	branch="crlf-two-line-subject" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> +	message="Subject first line\r\nSubject second line" &&
> +	subject="Subject first line Subject second line" &&
> +	body="" &&
> +	branch="crlf-two-line-subject-no-body" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> +	message="Subject first line\r\nSubject second line\r\n" &&
> +	branch="crlf-two-line-subject-no-body-trailing-newline" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> +	message="Subject first line\r\nSubject second line\r\n\r" &&
> +	branch="crlf-two-line-subject-no-body-trailing-newline2" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}"
> +}
> +
> +test_create_crlf_refs () {
> +	test_expect_success 'setup refs with CRLF commit messages' '
> +		create_crlf_refs
> +	'
> +}
> +
> +cleanup_crlf_refs () {
> +	for branch in ${LIB_CRLF_BRANCHES}; do
> +		git branch -D ${branch} &&
> +		git tag -d tag-${branch} &&
> +		rm .crlf-message-${branch}.txt &&
> +		rm .crlf-subject-${branch}.txt &&
> +		rm .crlf-body-${branch}.txt
> +	done
> +}
> +
> +test_cleanup_crlf_refs () {
> +	test_expect_success 'cleanup refs with CRLF commit messages' '
> +		cleanup_crlf_refs
> +	'
> +}
> +
> +test_crlf_subject_body_and_contents() {

It does not excempt a script from being subject to the coding
guidelines to be a test library.

> +	command_and_args="$@" &&
> +	command=$1 &&
> +	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
> +		atoms="(contents:subject) (contents:body) (contents)"
> +	elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
> +		atoms="s b B"
> +	fi &&

This is the part that made me react to the organization.  Even
though this helper "library" pretends to be generic, it needs to
actually know exactly what subcommands are going to be tested with
the helper.  It is probably easier to read and understand if these
helper functions are defined in the same script as the one that
tests these various commands but for one specific aspect of these
commands (i.e. how the log message with funny line ending convention
are split into subject and body).

> +	files="subject body message" &&
> +	while  [ -n "${atoms}" ]; do
> +		set ${atoms} && atom=$1 && shift && atoms="$*" &&
> +		set ${files} &&	file=$1 && shift && files="$*" &&
> +		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
> +			rm -f expect &&
> +			for ref in ${LIB_CRLF_BRANCHES}; do
> +				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
> +				printf \"\n\" >>expect
> +			done &&
> +			git $command_and_args --format=\"%${atom}\" >actual &&
> +			test_cmp expect actual
> +		"
> +	done
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
@ 2020-10-12 22:24       ` Junio C Hamano
  2020-10-14 13:09         ` Philippe Blain
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-10-12 22:24 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -	for (i = 0; i < len; i++)
> -		if (r[i] == '\n')
> -			r[i] = ' ';
> +	for (int i = 0; i < len; i++) {

We do not allow this in our codebase (yet).

cf. Documentation/CodingGuidelines

 - Declaring a variable in the for loop "for (int i = 0; i < 10; i++)"
   is still not allowed in this codebase.

> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> index 71818b90f0..c06eca774f 100755
> --- a/t/t3203-branch-output.sh
> +++ b/t/t3203-branch-output.sh
> @@ -3,13 +3,11 @@
>  test_description='git branch display tests'
>  . ./test-lib.sh
>  . "$TEST_DIRECTORY"/lib-terminal.sh
> +. "$TEST_DIRECTORY"/lib-crlf-messages.sh
>  
>  test_expect_success 'make commits' '
> -	echo content >file &&
> -	git add file &&
> -	git commit -m one &&
> -	echo content >>file &&
> -	git commit -a -m two
> +	test_commit one &&
> +	test_commit two
>  '

What does this change have to do with the topic at hand?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
  2020-10-12 22:22       ` Junio C Hamano
@ 2020-10-12 22:47       ` Eric Sunshine
  2020-10-14 13:20         ` Philippe Blain
  2020-10-22  3:09         ` Philippe Blain
  1 sibling, 2 replies; 38+ messages in thread
From: Eric Sunshine @ 2020-10-12 22:47 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Philippe Blain

On Mon, Oct 12, 2020 at 06:09:27PM +0000, Philippe Blain via GitGitGadget wrote:
> Add a test library (t/lib-crlf-messages.sh) that creates refs with such
> commit messages, so that we can easily test that this bug does not
> appear in other commands in the future.

In addition to Junio's review comments...

> 
> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
> ---
> diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
> @@ -0,0 +1,90 @@
> +create_crlf_ref () {
> +	message="$1" &&
> +	subject="$2" &&
> +	body="$3" &&
> +	branch="$4" &&
> +	printf "${message}" >.crlf-message-${branch}.txt &&
> +	printf "${subject}" >.crlf-subject-${branch}.txt &&
> +	printf "${body}" >.crlf-body-${branch}.txt &&
> +	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"

Broken &&-chain.

> +	test_tick &&
> +	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
> +	git branch ${branch} ${hash} &&
> +	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
> +}
> +
> +create_crlf_refs () {
> +	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
> +	body="Body first line\r\nBody second line\r\n" &&
> +	subject="Subject first line" &&
> +	branch="crlf" &&
> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&

This is somewhat onerous to digest and compose. Have you considered
making it more automated and easier to read? Perhaps something like
this:

    create_crlf_ref () {
        branch=$1
        cat >.crlf-message-$branch.txt &&
        sed -n "1,/^$/p" <.crlf-message-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
        sed -n "/^$/,\$p" <.crlf-message-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
        ...
    }

    create_crlf_refs () {
        create_crlf_ref crlf <<-\EOF
        Subject first line

        Body first line
        Body second line
        EOF
        ...
    }

> +test_create_crlf_refs () {
> +	test_expect_success 'setup refs with CRLF commit messages' '
> +		create_crlf_refs
> +	'
> +}

This almost seems like an unnecessary indirection since callers could
just as easily do this on their own, like this:

    test_expect_success 'setup refs with CRLF commit messages' '
        create_crlf_refs
    '

which isn't very burdensome. However, I suppose doing it this way
gives consistent test titles between scripts, so not necessarily a
strong objection on my part.

> +cleanup_crlf_refs () {
> +	for branch in ${LIB_CRLF_BRANCHES}; do

Our style is to place 'do' on its own line:

    for branch in $LIB_CRLF_BRANCHES
    do
        ...

This would be a syntax error if LIB_CRLF_BRANCHES is empty for some
reason, but I suppose we don't really have to worry about it here(?).

> +		git branch -D ${branch} &&
> +		git tag -d tag-${branch} &&
> +		rm .crlf-message-${branch}.txt &&
> +		rm .crlf-subject-${branch}.txt &&
> +		rm .crlf-body-${branch}.txt
> +	done
> +}
> +
> +test_cleanup_crlf_refs () {
> +	test_expect_success 'cleanup refs with CRLF commit messages' '
> +		cleanup_crlf_refs
> +	'
> +}
> +
> +test_crlf_subject_body_and_contents() {
> +	command_and_args="$@" &&
> +	command=$1 &&
> +	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
> +		atoms="(contents:subject) (contents:body) (contents)"
> +	elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
> +		atoms="s b B"
> +	fi &&

Style:

    if test "$command" = "branch" || test ...
    then
        ...
    elif test ...
    then
        ...
    fi &&

> +	files="subject body message" &&
> +	while  [ -n "${atoms}" ]; do

Too many spaces after 'while'.

Style:

    while tests -n "..."
    do
        ...

> +		set ${atoms} && atom=$1 && shift && atoms="$*" &&
> +		set ${files} &&	file=$1 && shift && files="$*" &&
> +		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
> +			rm -f expect &&
> +			for ref in ${LIB_CRLF_BRANCHES}; do

Style.

> +				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
> +				printf \"\n\" >>expect
> +			done &&
> +			git $command_and_args --format=\"%${atom}\" >actual &&
> +			test_cmp expect actual
> +		"
> +	done
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-12 22:24       ` Junio C Hamano
@ 2020-10-14 13:09         ` Philippe Blain
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-10-14 13:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

Hi Junio,

> Le 12 oct. 2020 à 18:24, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> -	for (i = 0; i < len; i++)
>> -		if (r[i] == '\n')
>> -			r[i] = ' ';
>> +	for (int i = 0; i < len; i++) {
> 
> We do not allow this in our codebase (yet).
> 
> cf. Documentation/CodingGuidelines
> 
> - Declaring a variable in the for loop "for (int i = 0; i < 10; i++)"
>   is still not allowed in this codebase.

Indeed. Will fix (and re-read the guidelines).

> 
>> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
>> index 71818b90f0..c06eca774f 100755
>> --- a/t/t3203-branch-output.sh
>> +++ b/t/t3203-branch-output.sh
>> @@ -3,13 +3,11 @@
>> test_description='git branch display tests'
>> . ./test-lib.sh
>> . "$TEST_DIRECTORY"/lib-terminal.sh
>> +. "$TEST_DIRECTORY"/lib-crlf-messages.sh
>> 
>> test_expect_success 'make commits' '
>> -	echo content >file &&
>> -	git add file &&
>> -	git commit -m one &&
>> -	echo content >>file &&
>> -	git commit -a -m two
>> +	test_commit one &&
>> +	test_commit two
>> '
> 
> What does this change have to do with the topic at hand?

In the previous iteration, the expected output of one of the tests
I was adding had commit hashes, 
so the change above was necessary to make those hashes reproducible.
However in this series I removed the hashes from the expected output
because it would break the "linux-clang" CI job which runs with SHA-256.

Maybe we should add a note in t/README specifically saying that raw commit
hashes should not appear in expected output, but use $(git rev-parse "${ref}")
instead ? Could this be a preparatory patch, or would deserve a separate topic ?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 22:47       ` Eric Sunshine
@ 2020-10-14 13:20         ` Philippe Blain
  2020-10-14 13:45           ` Eric Sunshine
  2020-10-22  3:09         ` Philippe Blain
  1 sibling, 1 reply; 38+ messages in thread
From: Philippe Blain @ 2020-10-14 13:20 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

Hi Eric,

> Le 12 oct. 2020 à 18:47, Eric Sunshine <sunshine@sunshineco.com> a écrit :
> 
> On Mon, Oct 12, 2020 at 06:09:27PM +0000, Philippe Blain via GitGitGadget wrote:
>> Add a test library (t/lib-crlf-messages.sh) that creates refs with such
>> commit messages, so that we can easily test that this bug does not
>> appear in other commands in the future.
> 
> In addition to Junio's review comments...
> 
>> 
>> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
>> ---
>> diff --git a/t/lib-crlf-messages.sh b/t/lib-crlf-messages.sh
>> @@ -0,0 +1,90 @@
>> +create_crlf_ref () {
>> +	message="$1" &&
>> +	subject="$2" &&
>> +	body="$3" &&
>> +	branch="$4" &&
>> +	printf "${message}" >.crlf-message-${branch}.txt &&
>> +	printf "${subject}" >.crlf-subject-${branch}.txt &&
>> +	printf "${body}" >.crlf-body-${branch}.txt &&
>> +	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"
> 
> Broken &&-chain.
> 
>> +	test_tick &&
>> +	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
>> +	git branch ${branch} ${hash} &&
>> +	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
>> +}
>> +
>> +create_crlf_refs () {
>> +	message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
>> +	body="Body first line\r\nBody second line\r\n" &&
>> +	subject="Subject first line" &&
>> +	branch="crlf" &&
>> +	create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
> 
> This is somewhat onerous to digest and compose. Have you considered
> making it more automated and easier to read? Perhaps something like
> this:
> 
>    create_crlf_ref () {
>        branch=$1
>        cat >.crlf-message-$branch.txt &&
>        sed -n "1,/^$/p" <.crlf-message-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
>        sed -n "/^$/,\$p" <.crlf-message-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
>        ...
>    }
> 
>    create_crlf_refs () {
>        create_crlf_ref crlf <<-\EOF
>        Subject first line
> 
>        Body first line
>        Body second line
>        EOF
>        ...
>    }

I did not try to do that because I did not think of it. 
However, I think it's clearer using printf, this way '\n' and '\r'
appear clearly on all platforms, whatever editor is in use
and whatever settings this editor is using to hide or not hide
control characters.

> 
>> +test_create_crlf_refs () {
>> +	test_expect_success 'setup refs with CRLF commit messages' '
>> +		create_crlf_refs
>> +	'
>> +}
> 
> This almost seems like an unnecessary indirection since callers could
> just as easily do this on their own, like this:
> 
>    test_expect_success 'setup refs with CRLF commit messages' '
>        create_crlf_refs
>    '
> 
> which isn't very burdensome. However, I suppose doing it this way
> gives consistent test titles between scripts, so not necessarily a
> strong objection on my part.

Yes, that was the reason. Given Junio's comments I'll surely
refactor his library into a script, so there won't be a need for
this indirection.

> 
>> +cleanup_crlf_refs () {
>> +	for branch in ${LIB_CRLF_BRANCHES}; do
> 
> Our style is to place 'do' on its own line:
> 
>    for branch in $LIB_CRLF_BRANCHES
>    do
>        ...
> 
> This would be a syntax error if LIB_CRLF_BRANCHES is empty for some
> reason, but I suppose we don't really have to worry about it here(?).
> 
>> +		git branch -D ${branch} &&
>> +		git tag -d tag-${branch} &&
>> +		rm .crlf-message-${branch}.txt &&
>> +		rm .crlf-subject-${branch}.txt &&
>> +		rm .crlf-body-${branch}.txt
>> +	done
>> +}
>> +
>> +test_cleanup_crlf_refs () {
>> +	test_expect_success 'cleanup refs with CRLF commit messages' '
>> +		cleanup_crlf_refs
>> +	'
>> +}
>> +
>> +test_crlf_subject_body_and_contents() {
>> +	command_and_args="$@" &&
>> +	command=$1 &&
>> +	if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
>> +		atoms="(contents:subject) (contents:body) (contents)"
>> +	elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
>> +		atoms="s b B"
>> +	fi &&
> 
> Style:
> 
>    if test "$command" = "branch" || test ...
>    then
>        ...
>    elif test ...
>    then
>        ...
>    fi &&
> 
>> +	files="subject body message" &&
>> +	while  [ -n "${atoms}" ]; do
> 
> Too many spaces after 'while'.
> 
> Style:
> 
>    while tests -n "..."
>    do
>        ...
> 
>> +		set ${atoms} && atom=$1 && shift && atoms="$*" &&
>> +		set ${files} &&	file=$1 && shift && files="$*" &&
>> +		test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
>> +			rm -f expect &&
>> +			for ref in ${LIB_CRLF_BRANCHES}; do
> 
> Style.
> 
>> +				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
>> +				printf \"\n\" >>expect
>> +			done &&
>> +			git $command_and_args --format=\"%${atom}\" >actual &&
>> +			test_cmp expect actual
>> +		"
>> +	done
>> +}

Thanks for the review! (and I'll re-read the shell coding guidelines
before sending v4).

Cheers,
Philippe.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 22:22       ` Junio C Hamano
@ 2020-10-14 13:22         ` Philippe Blain
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-10-14 13:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

Hi Junio,

> Le 12 oct. 2020 à 18:22, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>> 
>> A following commit will fix a bug in the ref-filter API that causes
>> commit and tag messages containing CRLF to be incorrectly parsed and
>> displayed.
>> 
>> Add a test library (t/lib-crlf-messages.sh) that creates refs with such
>> commit messages, so that we can easily test that this bug does not
>> appear in other commands in the future.
>> ...
>> The function `test_crlf_subject_body_and_contents` can be used to test
>> that the `--format` option of `branch`, `tag`, `for-each-ref` and
>> `log` correctly displays the subject, body and raw content of commits and
>> tag messages.
> 
> I am not sure about the wisdom of this arrangement.  Surely you do
> not want to write duplicated set-up for (existing) test scripts for
> for-each-ref, branch and tag subcommands, assuming that these test
> scripts are separated for subcommands they test.
> 
> But you can have a single test script, that is differentiated from
> all other test scripts by what it tests: having to deal with commits
> that use CRLF.  Then we do not have to add dot-includable test
> library that lets various tests to create these same funny commits.
> Instead, we can just do these as normal set-up step(s) for that
> single test scripts, and then in that test scripts, verify what is
> shown by various commands that share the underlying ref-filter
> machinery.  No?

Yes. I was thinking that it made more sense for the tests
to be in existing test scripts, but if you feel a separate test
script is warranted for these tests, I'll do that instead.

Thanks,

Philippe.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-14 13:20         ` Philippe Blain
@ 2020-10-14 13:45           ` Eric Sunshine
  2020-10-14 13:52             ` Philippe Blain
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Sunshine @ 2020-10-14 13:45 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

On Wed, Oct 14, 2020 at 9:20 AM Philippe Blain
<levraiphilippeblain@gmail.com> wrote:
> > Le 12 oct. 2020 à 18:47, Eric Sunshine <sunshine@sunshineco.com> a écrit :
> > This is somewhat onerous to digest and compose. Have you considered
> > making it more automated and easier to read? Perhaps something like
> > this:
> >
> >    create_crlf_ref () {
> >        branch=$1
> >        cat >.crlf-message-$branch.txt &&
> >        sed -n "1,/^$/p" <.crlf-message-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
> >        sed -n "/^$/,\$p" <.crlf-message-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
> >        ...
> >    }
> >
> >    create_crlf_refs () {
> >        create_crlf_ref crlf <<-\EOF
> >        Subject first line
> >
> >        Body first line
> >        Body second line
> >        EOF
> >        ...
> >    }
>
> I did not try to do that because I did not think of it.
> However, I think it's clearer using printf, this way '\n' and '\r'
> appear clearly on all platforms, whatever editor is in use
> and whatever settings this editor is using to hide or not hide
> control characters.

Sorry, I'm not sure I understand what you are saying about editors and
hiding or not hiding control characters. There are no hidden control
characters in the example code I posted.

The code I proposed is very explicit about using CRLF terminators. The
here-doc fed to create_crlf_ref() contains only the normal LF, but
then create_crlf_ref() explicitly converts those to CRLF by calling
append_cr().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-14 13:45           ` Eric Sunshine
@ 2020-10-14 13:52             ` Philippe Blain
  2020-10-14 23:01               ` Eric Sunshine
  0 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain @ 2020-10-14 13:52 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie


> Le 14 oct. 2020 à 09:45, Eric Sunshine <sunshine@sunshineco.com> a écrit :
> 
> On Wed, Oct 14, 2020 at 9:20 AM Philippe Blain
> <levraiphilippeblain@gmail.com> wrote:
>>> Le 12 oct. 2020 à 18:47, Eric Sunshine <sunshine@sunshineco.com> a écrit :
>>> This is somewhat onerous to digest and compose. Have you considered
>>> making it more automated and easier to read? Perhaps something like
>>> this:
>>> 
>>>   create_crlf_ref () {
>>>       branch=$1
>>>       cat >.crlf-message-$branch.txt &&
>>>       sed -n "1,/^$/p" <.crlf-message-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
>>>       sed -n "/^$/,\$p" <.crlf-message-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
>>>       ...
>>>   }
>>> 
>>>   create_crlf_refs () {
>>>       create_crlf_ref crlf <<-\EOF
>>>       Subject first line
>>> 
>>>       Body first line
>>>       Body second line
>>>       EOF
>>>       ...
>>>   }
>> 
>> I did not try to do that because I did not think of it.
>> However, I think it's clearer using printf, this way '\n' and '\r'
>> appear clearly on all platforms, whatever editor is in use
>> and whatever settings this editor is using to hide or not hide
>> control characters.
> 
> Sorry, I'm not sure I understand what you are saying about editors and
> hiding or not hiding control characters. There are no hidden control
> characters in the example code I posted.
> 
> The code I proposed is very explicit about using CRLF terminators. The
> here-doc fed to create_crlf_ref() contains only the normal LF, but
> then create_crlf_ref() explicitly converts those to CRLF by calling
> append_cr().

Sorry, I missed that. I'll try to see if I can make it simpler using
this approach then. 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-14 13:52             ` Philippe Blain
@ 2020-10-14 23:01               ` Eric Sunshine
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Sunshine @ 2020-10-14 23:01 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

On Wed, Oct 14, 2020 at 9:52 AM Philippe Blain <levraiphilippeblain@gmail.com> wrote:
> > Le 14 oct. 2020 à 09:45, Eric Sunshine <sunshine@sunshineco.com> a écrit :
> >>> Le 12 oct. 2020 à 18:47, Eric Sunshine <sunshine@sunshineco.com> a écrit :
> >>>  create_crlf_ref () {
> >>>    branch=$1
> >>>    cat >.crlf-message-$branch.txt &&
> >>>    sed -n "1,/^$/p" <.crlf-message-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
> >>>    sed -n "/^$/,\$p" <.crlf-message-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
> >
> > The code I proposed is very explicit about using CRLF terminators. The
> > here-doc fed to create_crlf_ref() contains only the normal LF, but
> > then create_crlf_ref() explicitly converts those to CRLF by calling
> > append_cr().
>
> Sorry, I missed that. I'll try to see if I can make it simpler using
> this approach then.

By the way, if you also need .crlf-message-$branch.txt to have CRLF
line endings, then you'll probably want to use a temporary file (for
instance, .crlf-orig-$branch.txt), perhaps like this:

    create_crlf_ref () {
        branch=$1 &&
        cat >.crlf-orig-$branch.txt &&
        append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
        sed -n "1,/^$/p" <.crlf-orig-$branch.txt | sed "/^$/d" | append_cr >.crlf-subject-$branch.txt &&
        sed -n "/^$/,\$p" <.crlf-orig-$branch.txt | sed "1d" | append_cr >.crlf-body-$branch.txt &&
        ...
    }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-10-12 18:09     ` [PATCH v3 3/3] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-10-22  3:01     ` Philippe Blain via GitGitGadget
  2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
                         ` (2 more replies)
  3 siblings, 3 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-22  3:01 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine, Philippe Blain

Changes since v3 :

 * removed lib-crlf-messages.sh and the changes to the existing test
   scripts, and instead added a new test script : t3920-crlf-messages.sh,
   and adjusted the commit message of 1/2 (which was 2/3 in v3) accordingly.
 * tweaked the code and the test script to be in line with the C and shell
   coding guidelines, following comments by Junio and Eric
 * made the generation of the crlf-* refs and helper files in the test
   script more readable and automated, as suggested by Eric.

Notes: 

 1. I was really unsure about which number to choose for the new script, so
    I chose one available at the end of section 3 ("the other basic commands
    " according to t/README), so I'll change that if we come up with a
    better number.
 2. the automated range-diff generated by Gitgitgadget (at the end of the
    cover) will be mostly useless because of the refactoring of the test
    library into a test script. So I'm including a few (manually edited)
    blob diffs (git diff v3:file v4:other-file) below to make reviewing
    easier.


----------------------------------------------------------------------------

$ git diff pr-576/phil-blain/fix-branch-verbose-crlf-v3:t/lib-crlf-messages.sh HEAD:t/t3920-crlf-messages.sh
diff --git a/t/lib-crlf-messages.sh b/t/t3920-crlf-messages.sh
old mode 100644
new mode 100755
index 10a2b57280..7a13d6b38a
--- a/t/lib-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,16 +1,17 @@
-# Setup refs with commit and tag messages containing CRLF
+#!/bin/sh
+
+test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+. ./test-lib.sh

 LIB_CRLF_BRANCHES=""

 create_crlf_ref () {
-    message="$1" &&
-    subject="$2" &&
-    body="$3" &&
-    branch="$4" &&
-    printf "${message}" >.crlf-message-${branch}.txt &&
-    printf "${subject}" >.crlf-subject-${branch}.txt &&
-    printf "${body}" >.crlf-body-${branch}.txt &&
-    LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}"
+    branch="$1" &&
+    cat >.crlf-orig-$branch.txt &&
+    cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
+    grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
+    grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
+    LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
     test_tick &&
     hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
     git branch ${branch} ${hash} &&
@@ -18,73 +19,108 @@ create_crlf_ref () {
 }

 create_crlf_refs () {
-    message="Subject first line\r\n\r\nBody first line\r\nBody second line\r\n" &&
-    body="Body first line\r\nBody second line\r\n" &&
-    subject="Subject first line" &&
-    branch="crlf" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
-    message="Subject first line\r\n\r\n\r\nBody first line\r\nBody second line\r\n" &&
-    branch="crlf-empty-lines-after-subject" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
-    message="Subject first line\r\nSubject second line\r\n\r\nBody first line\r\nBody second line\r\n" &&
-    subject="Subject first line Subject second line" &&
-    branch="crlf-two-line-subject" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
-    message="Subject first line\r\nSubject second line" &&
-    subject="Subject first line Subject second line" &&
-    body="" &&
-    branch="crlf-two-line-subject-no-body" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
-    message="Subject first line\r\nSubject second line\r\n" &&
-    branch="crlf-two-line-subject-no-body-trailing-newline" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}" &&
-    message="Subject first line\r\nSubject second line\r\n\r" &&
-    branch="crlf-two-line-subject-no-body-trailing-newline2" &&
-    create_crlf_ref "${message}" "${subject}" "${body}" "${branch}"
+    create_crlf_ref crlf <<-\EOF &&
+    Subject first line
+
+    Body first line
+    Body second line
+    EOF
+    create_crlf_ref crlf-empty-lines-after-subject <<-\EOF &&
+    Subject first line
+
+
+    Body first line
+    Body second line
+    EOF
+    create_crlf_ref crlf-two-line-subject <<-\EOF &&
+    Subject first line
+    Subject second line
+
+    Body first line
+    Body second line
+    EOF
+    create_crlf_ref crlf-two-line-subject-no-body <<-\EOF &&
+    Subject first line
+    Subject second line
+    EOF
+    create_crlf_ref crlf-two-line-subject-no-body-trailing-newline <<-\EOF
+    Subject first line
+    Subject second line
+
+    EOF
-test_create_crlf_refs () {
-    test_expect_success 'setup refs with CRLF commit messages' '
-        create_crlf_refs
-    '
-}
-
-cleanup_crlf_refs () {
-    for branch in ${LIB_CRLF_BRANCHES}; do
-        git branch -D ${branch} &&
-        git tag -d tag-${branch} &&
-        rm .crlf-message-${branch}.txt &&
-        rm .crlf-subject-${branch}.txt &&
-        rm .crlf-body-${branch}.txt
-    done
-}

-test_cleanup_crlf_refs () {
-    test_expect_success 'cleanup refs with CRLF commit messages' '
-        cleanup_crlf_refs
-    '
 }



 test_crlf_subject_body_and_contents() {
     command_and_args="$@" &&
     command=$1 &&
-    if [ ${command} = "branch" ] || [ ${command} = "for-each-ref" ] || [ ${command} = "tag" ]; then
+    if test ${command} = "branch" || test ${command} = "for-each-ref" || test ${command} = "tag"
+    then
         atoms="(contents:subject) (contents:body) (contents)"
-    elif [ ${command} = "log" ] || [ ${command} = "show" ]; then
+    elif test ${command} = "log" || test ${command} = "show"
+    then
         atoms="s b B"
     fi &&
     files="subject body message" &&
-    while  [ -n "${atoms}" ]; do
+    while test -n "${atoms}"
+    do
         set ${atoms} && atom=$1 && shift && atoms="$*" &&
         set ${files} &&    file=$1 && shift && files="$*" &&
-        test_expect_success "${command}: --format='%${atom}' works with CRLF input" "
+        test_expect_success "${command}: --format='%${atom}' works with messages using CRLF" "
             rm -f expect &&
-            for ref in ${LIB_CRLF_BRANCHES}; do
+            for ref in ${LIB_CRLF_BRANCHES}
+            do
                 cat .crlf-${file}-\"\${ref}\".txt >>expect &&
                 printf \"\n\" >>expect
             done &&
             git $command_and_args --format=\"%${atom}\" >actual &&
             test_cmp expect actual
         "
     done
 }
+
+
+test_expect_success 'Setup refs with commit and tag messages using CRLF' '
+    test_commit inital &&
+    create_crlf_refs
+'
+

git diff pr-576/phil-blain/fix-branch-verbose-crlf-v3:t/t3203-branch-output.sh HEAD:t/t3920-crlf-messages.sh 
diff --git a/t/t3203-branch-output.sh b/t/t3920-crlf-messages.sh
index c06eca774f..7a13d6b38a 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,111 +1,97 @@
 #!/bin/sh

-test_expect_success 'git branch -v works with CRLF input' '
-       cat >expect <<-EOF &&
-         two
-         one
-         Subject first line
-         Subject first line
-         Subject first line Subject second line
-         Subject first line Subject second line
-         Subject first line Subject second line
-         Subject first line Subject second line
-       EOF
+test_expect_success 'branch: --verbose works with messages using CRLF' '
+       rm -f expect &&
+       for branch in $LIB_CRLF_BRANCHES
+       do
+               printf "  " >>expect &&
+               cat .crlf-subject-${branch}.txt >>expect &&
+               printf "\n" >>expect
+       done &&
        git branch -v >tmp &&
        # Remove first two columns, and the line for the currently checked out branch
        current=$(git branch --show-current) &&

$ git diff pr-576/phil-blain/fix-branch-verbose-crlf-v3:t/t4202-log.sh HEAD:t/t3920-crlf-messages.sh
diff --git a/t/t4202-log.sh b/t/t3920-crlf-messages.sh
index d4942a6f92..7a13d6b38a 100755
--- a/t/t4202-log.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,1930 +1,126 @@
-test_expect_success 'oneline with CRLF messages' '
-       for branch in $LIB_CLRF_BRANCHES; do
+test_expect_success 'log: --oneline works with messages using CRLF' '
+       for branch in $LIB_CRLF_BRANCHES
+       do
                cat .crlf-subject-${branch}.txt >expect &&
+               printf "\n" >>expect &&
                git log --oneline -1 ${branch} >tmp-branch &&
                git log --oneline -1 tag-${branch} >tmp-tag &&
-               awk "{print \$NF}" <tmp-branch >actual-branch &&
-               awk "{print \$NF}" <tmp-tag >actual-tag &&
+               cut -d" " -f2- <tmp-branch >actual-branch &&
+               cut -d" " -f2- <tmp-tag >actual-tag &&
                test_cmp expect actual-branch &&
                test_cmp expect actual-tag
        done 
 '


----------------------------------------------------------------------------

The ref-filter code does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
--cleanup=verbatim option of git commit and git tag, or by using git
commit-tree directly.

The function find_subpos in ref-filter.c looks for two consecutive '\n' to
find the end of the subject line, a sequence which is absent in messages
using CRLF. This results in the whole message being parsed as the subject
line (%(contents:subject)), and the body of the message (%(contents:body))
being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched.

This bug is a regression for git branch --verbose, which bisects down to
949af06 (branch: use ref-filter printing APIs, 2017-01-10).

The first patch fixes the bug in ref-filter.c and adds a test script to
check that the ref-filter and pretty APIs deal correctly with CRLF messages.

The second patch adds tests that check the behaviour of git log andgit show 
in the presence of CRLF in messages, to prevent futur regressions.

Philippe Blain (2):
  ref-filter: handle CRLF at end-of-line more gracefully
  log, show: add tests for messages containing CRLF

 ref-filter.c             |  36 ++++++-----
 t/t3920-crlf-messages.sh | 126 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 14 deletions(-)
 create mode 100755 t/t3920-crlf-messages.sh


base-commit: a5fa49ff0a8f3252c6bff49f92b85e7683868f8a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-576%2Fphil-blain%2Ffix-branch-verbose-crlf-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-576/phil-blain/fix-branch-verbose-crlf-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/576

Range-diff vs v3:

 1:  f17d182c3b < -:  ---------- t: add lib-crlf-messages.sh for messages containing CRLF
 2:  11d044a4f7 ! 1:  03b2d7d78a ref-filter: handle CRLF at end-of-line more gracefully
     @@ Metadata
       ## Commit message ##
          ref-filter: handle CRLF at end-of-line more gracefully
      
     -    The ref-filter code does not correctly handle commit or tag messages that
     -    use CRLF as the line terminator. Such messages can be created with the
     -    `--verbatim` option of `git commit` and `git tag`, or by using `git
     -    commit-tree` directly.
     +    The ref-filter code does not correctly handle commit or tag messages
     +    that use CRLF as the line terminator. Such messages can be created with
     +    the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
     +    using `git commit-tree` directly.
      
          The function `find_subpos` in ref-filter.c looks for two consecutive
          LFs to find the end of the subject line, a sequence which is absent in
          messages using CRLF. This results in the whole message being parsed as
          the subject line (`%(contents:subject)`), and the body of the message
     -    (`%(contents:body)`)  being empty.
     +    (`%(contents:body)`) being empty.
      
          Moreover, in `copy_subject`, which wants to return the subject as a
          single line, '\n' is replaced by space, but '\r' is
     @@ Commit message
          Fix this bug in ref-filter by hardening the logic in `copy_subject` and
          `find_subpos` to correctly parse messages containing CRLF.
      
     -    Add tests for `branch`, `tag` and `for-each-ref` using
     -    lib-crlf-messages.sh.
     -
     -    The 'make commits' test at the beginning of t3203-branch-output.sh does
     -    not use `test_tick` and thus the commit hashes are not reproducible. For
     -    simplicity, use `test_commit` to create the commits, as the content and
     -    name of the files created in this setup test are irrelevant to the rest
     -    of the test script.
     -
     -    Use `test_cleanup_crlf_refs` in t3203-branch-output.sh and t7004-tag.sh
     -    to avoid having to modify the expected output in later tests.
     +    Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
     +    of commands using either the ref-filter or the pretty APIs with messages
     +    using CRLF line endings. The function `test_crlf_subject_body_and_contents`
     +    can be used to test that the `--format` option of `branch`, `tag`,
     +    `for-each-ref`, `log` and `show` correctly displays the subject, body
     +    and raw content of commit and tag messages using CRLF. Test the
     +    output of `branch`, `tag` and `for-each-ref` with such commits.
      
          Helped-by: Junio C Hamano <gitster@pobox.com>
     +    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
      
       ## ref-filter.c ##
     @@ ref-filter.c: static const char *copy_email(const char *buf, struct used_atom *a
       static char *copy_subject(const char *buf, unsigned long len)
       {
      -	char *r = xmemdupz(buf, len);
     --	int i;
      +	struct strbuf sb = STRBUF_INIT;
     + 	int i;
       
      -	for (i = 0; i < len; i++)
      -		if (r[i] == '\n')
      -			r[i] = ' ';
     -+	for (int i = 0; i < len; i++) {
     ++	for (i = 0; i < len; i++) {
      +		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
      +			continue; /* ignore CR in CRLF */
       
     @@ ref-filter.c: static void find_subpos(const char *buf,
       	*sublen = buf - *sub;
       	/* drop trailing newline, if present */
      -	if (*sublen && (*sub)[*sublen - 1] == '\n')
     -+	while (*sublen && ((*sub)[*sublen - 1] == '\n' || (*sub)[*sublen - 1] == '\r'))
     ++	while (*sublen && ((*sub)[*sublen - 1] == '\n' ||
     ++			   (*sub)[*sublen - 1] == '\r'))
       		*sublen -= 1;
       
       	/* skip any empty lines */
     @@ ref-filter.c: static void find_subpos(const char *buf,
       	*body = buf;
       	*bodylen = strlen(buf);
      
     - ## t/t3203-branch-output.sh ##
     + ## t/t3920-crlf-messages.sh (new) ##
      @@
     - test_description='git branch display tests'
     - . ./test-lib.sh
     - . "$TEST_DIRECTORY"/lib-terminal.sh
     -+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
     - 
     - test_expect_success 'make commits' '
     --	echo content >file &&
     --	git add file &&
     --	git commit -m one &&
     --	echo content >>file &&
     --	git commit -a -m two
     -+	test_commit one &&
     -+	test_commit two
     - '
     - 
     - test_expect_success 'make branches' '
     -@@ t/t3203-branch-output.sh: test_expect_success 'git branch --ignore-case --list -v pattern shows branch sum
     - 	awk "{print \$NF}" <tmp >actual &&
     - 	test_cmp expect actual
     - '
     -+test_create_crlf_refs
     -+
     -+test_expect_success 'git branch -v works with CRLF input' '
     -+	cat >expect <<-EOF &&
     -+	  two
     -+	  one
     -+	  Subject first line
     -+	  Subject first line
     -+	  Subject first line Subject second line
     -+	  Subject first line Subject second line
     -+	  Subject first line Subject second line
     -+	  Subject first line Subject second line
     ++#!/bin/sh
     ++
     ++test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
     ++. ./test-lib.sh
     ++
     ++LIB_CRLF_BRANCHES=""
     ++
     ++create_crlf_ref () {
     ++	branch="$1" &&
     ++	cat >.crlf-orig-$branch.txt &&
     ++	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
     ++	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
     ++	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
     ++	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
     ++	test_tick &&
     ++	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
     ++	git branch ${branch} ${hash} &&
     ++	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
     ++}
     ++
     ++create_crlf_refs () {
     ++	create_crlf_ref crlf <<-\EOF &&
     ++	Subject first line
     ++
     ++	Body first line
     ++	Body second line
      +	EOF
     ++	create_crlf_ref crlf-empty-lines-after-subject <<-\EOF &&
     ++	Subject first line
     ++
     ++
     ++	Body first line
     ++	Body second line
     ++	EOF
     ++	create_crlf_ref crlf-two-line-subject <<-\EOF &&
     ++	Subject first line
     ++	Subject second line
     ++
     ++	Body first line
     ++	Body second line
     ++	EOF
     ++	create_crlf_ref crlf-two-line-subject-no-body <<-\EOF &&
     ++	Subject first line
     ++	Subject second line
     ++	EOF
     ++	create_crlf_ref crlf-two-line-subject-no-body-trailing-newline <<-\EOF
     ++	Subject first line
     ++	Subject second line
     ++
     ++	EOF
     ++}
     ++
     ++test_crlf_subject_body_and_contents() {
     ++	command_and_args="$@" &&
     ++	command=$1 &&
     ++	if test ${command} = "branch" || test ${command} = "for-each-ref" || test ${command} = "tag"
     ++	then
     ++		atoms="(contents:subject) (contents:body) (contents)"
     ++	elif test ${command} = "log" || test ${command} = "show"
     ++	then
     ++		atoms="s b B"
     ++	fi &&
     ++	files="subject body message" &&
     ++	while test -n "${atoms}"
     ++	do
     ++		set ${atoms} && atom=$1 && shift && atoms="$*" &&
     ++		set ${files} &&	file=$1 && shift && files="$*" &&
     ++		test_expect_success "${command}: --format='%${atom}' works with messages using CRLF" "
     ++			rm -f expect &&
     ++			for ref in ${LIB_CRLF_BRANCHES}
     ++			do
     ++				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
     ++				printf \"\n\" >>expect
     ++			done &&
     ++			git $command_and_args --format=\"%${atom}\" >actual &&
     ++			test_cmp expect actual
     ++		"
     ++	done
     ++}
     ++
     ++
     ++test_expect_success 'Setup refs with commit and tag messages using CRLF' '
     ++	test_commit inital &&
     ++	create_crlf_refs
     ++'
     ++
     ++test_expect_success 'branch: --verbose works with messages using CRLF' '
     ++	rm -f expect &&
     ++	for branch in $LIB_CRLF_BRANCHES
     ++	do
     ++		printf "  " >>expect &&
     ++		cat .crlf-subject-${branch}.txt >>expect &&
     ++		printf "\n" >>expect
     ++	done &&
      +	git branch -v >tmp &&
      +	# Remove first two columns, and the line for the currently checked out branch
      +	current=$(git branch --show-current) &&
     @@ t/t3203-branch-output.sh: test_expect_success 'git branch --ignore-case --list -
      +
      +test_crlf_subject_body_and_contents branch --list crlf*
      +
     -+test_cleanup_crlf_refs
     - 
     - test_expect_success 'git branch -v pattern does not show branch summaries' '
     - 	test_must_fail git branch -v branch*
     -
     - ## t/t6300-for-each-ref.sh ##
     -@@ t/t6300-for-each-ref.sh: test_description='for-each-ref test'
     - . ./test-lib.sh
     - . "$TEST_DIRECTORY"/lib-gpg.sh
     - . "$TEST_DIRECTORY"/lib-terminal.sh
     -+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
     - 
     - # Mon Jul 3 23:18:43 2006 +0000
     - datestamp=1151968723
     -@@ t/t6300-for-each-ref.sh: test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
     - 	test_cmp expect actual
     - '
     - 
     -+test_create_crlf_refs
     -+
     -+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
     -+
     - test_done
     -
     - ## t/t7004-tag.sh ##
     -@@ t/t7004-tag.sh: Tests for operations with tags.'
     - . ./test-lib.sh
     - . "$TEST_DIRECTORY"/lib-gpg.sh
     - . "$TEST_DIRECTORY"/lib-terminal.sh
     -+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
     - 
     - # creating and listing lightweight tags:
     - 
     -@@ t/t7004-tag.sh: test_expect_success '--format should list tags as per format given' '
     - 	test_cmp expect actual
     - '
     - 
     -+test_create_crlf_refs
     -+
      +test_crlf_subject_body_and_contents tag --list tag-crlf*
      +
     -+test_cleanup_crlf_refs
     ++test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
      +
     - test_expect_success "set up color tests" '
     - 	echo "<RED>v1.0<RESET>" >expect.color &&
     - 	echo "v1.0" >expect.bare &&
     ++test_done
 3:  59957d1391 ! 2:  75a87887be log, show: add tests for messages containing CRLF
     @@ Commit message
          such messages, to prevent futur regressions if these commands are
          refactored to use the ref-filter API.
      
     -    To prevent having to modify expected output in further tests, use
     -    'test_cleanup_crlf_refs' in t4202 to clean-up after the added tests.
     -
          Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
      
     - ## t/t4202-log.sh ##
     -@@ t/t4202-log.sh: test_description='git log'
     - . "$TEST_DIRECTORY/lib-gpg.sh"
     - . "$TEST_DIRECTORY/lib-terminal.sh"
     - . "$TEST_DIRECTORY/lib-log-graph.sh"
     -+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
     + ## t/t3920-crlf-messages.sh ##
     +@@ t/t3920-crlf-messages.sh: test_crlf_subject_body_and_contents tag --list tag-crlf*
       
     - test_cmp_graph () {
     - 	lib_test_cmp_graph --format=%s "$@"
     -@@ t/t4202-log.sh: test_expect_success 'oneline' '
     - 	test_cmp expect actual
     - '
     + test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
       
     -+test_create_crlf_refs
     -+
     -+test_expect_success 'oneline with CRLF messages' '
     -+	for branch in $LIB_CLRF_BRANCHES; do
     ++test_expect_success 'log: --oneline works with messages using CRLF' '
     ++	for branch in $LIB_CRLF_BRANCHES
     ++	do
      +		cat .crlf-subject-${branch}.txt >expect &&
     ++		printf "\n" >>expect &&
      +		git log --oneline -1 ${branch} >tmp-branch &&
      +		git log --oneline -1 tag-${branch} >tmp-tag &&
     -+		awk "{print \$NF}" <tmp-branch >actual-branch &&
     -+		awk "{print \$NF}" <tmp-tag >actual-tag &&
     ++		cut -d" " -f2- <tmp-branch >actual-branch &&
     ++		cut -d" " -f2- <tmp-tag >actual-tag &&
      +		test_cmp expect actual-branch &&
      +		test_cmp expect actual-tag
      +	done
      +'
     -+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
      +
     -+test_cleanup_crlf_refs
     -+
     - test_expect_success 'diff-filter=A' '
     - 
     - 	git log --no-renames --pretty="format:%s" --diff-filter=A HEAD > actual &&
     -
     - ## t/t7007-show.sh ##
     -@@
     - test_description='git show'
     - 
     - . ./test-lib.sh
     -+. "$TEST_DIRECTORY/lib-crlf-messages.sh"
     - 
     - test_expect_success setup '
     - 	echo hello world >foo &&
     -@@ t/t7007-show.sh: test_expect_success 'show --graph is forbidden' '
     -   test_must_fail git show --graph HEAD
     - '
     - 
     -+test_create_crlf_refs
     ++test_crlf_subject_body_and_contents log --all --reverse --grep Subject
      +
      +test_crlf_subject_body_and_contents show $LIB_CRLF_BRANCHES
      +

-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
@ 2020-10-22  3:01       ` Philippe Blain via GitGitGadget
  2020-10-23  0:52         ` Junio C Hamano
  2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
  2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-22  3:01 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine, Philippe Blain,
	Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter code does not correctly handle commit or tag messages
that use CRLF as the line terminator. Such messages can be created with
the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
using `git commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`) being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This bug is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Fix this bug in ref-filter by hardening the logic in `copy_subject` and
`find_subpos` to correctly parse messages containing CRLF.

Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
of commands using either the ref-filter or the pretty APIs with messages
using CRLF line endings. The function `test_crlf_subject_body_and_contents`
can be used to test that the `--format` option of `branch`, `tag`,
`for-each-ref`, `log` and `show` correctly displays the subject, body
and raw content of commit and tag messages using CRLF. Test the
output of `branch`, `tag` and `for-each-ref` with such commits.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             |  36 ++++++++-----
 t/t3920-crlf-messages.sh | 108 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 14 deletions(-)
 create mode 100755 t/t3920-crlf-messages.sh

diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..6476686fea 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1097,14 +1097,19 @@ static const char *copy_email(const char *buf, struct used_atom *atom)
 
 static char *copy_subject(const char *buf, unsigned long len)
 {
-	char *r = xmemdupz(buf, len);
+	struct strbuf sb = STRBUF_INIT;
 	int i;
 
-	for (i = 0; i < len; i++)
-		if (r[i] == '\n')
-			r[i] = ' ';
+	for (i = 0; i < len; i++) {
+		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
+			continue; /* ignore CR in CRLF */
 
-	return r;
+		if (buf[i] == '\n')
+			strbuf_addch(&sb, ' ');
+		else
+			strbuf_addch(&sb, buf[i]);
+	}
+	return strbuf_detach(&sb, NULL);
 }
 
 static void grab_date(const char *buf, struct atom_value *v, const char *atomname)
@@ -1228,20 +1233,23 @@ static void find_subpos(const char *buf,
 
 	/* subject is first non-empty line */
 	*sub = buf;
-	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
-		buf = eol;
-	}
+	/* subject goes to first empty line before signature begins */
+	if ((eol = strstr(*sub, "\n\n"))) {
+		eol = eol < *sig ? eol : *sig;
+	/* check if message uses CRLF */
+	} else if (! (eol = strstr(*sub, "\r\n\r\n"))) {
+		/* treat whole message as subject */
+		eol = strrchr(*sub, '\0');
+	}
+	buf = eol;
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	while (*sublen && ((*sub)[*sublen - 1] == '\n' ||
+			   (*sub)[*sublen - 1] == '\r'))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
new file mode 100755
index 0000000000..3f0ce02c3f
--- /dev/null
+++ b/t/t3920-crlf-messages.sh
@@ -0,0 +1,108 @@
+#!/bin/sh
+
+test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+. ./test-lib.sh
+
+LIB_CRLF_BRANCHES=""
+
+create_crlf_ref () {
+	branch="$1" &&
+	cat >.crlf-orig-$branch.txt &&
+	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
+	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
+	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
+	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
+	test_tick &&
+	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
+	git branch ${branch} ${hash} &&
+	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
+}
+
+create_crlf_refs () {
+	create_crlf_ref crlf <<-\EOF &&
+	Subject first line
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-empty-lines-after-subject <<-\EOF &&
+	Subject first line
+
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-two-line-subject <<-\EOF &&
+	Subject first line
+	Subject second line
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-two-line-subject-no-body <<-\EOF &&
+	Subject first line
+	Subject second line
+	EOF
+	create_crlf_ref crlf-two-line-subject-no-body-trailing-newline <<-\EOF
+	Subject first line
+	Subject second line
+
+	EOF
+}
+
+test_crlf_subject_body_and_contents() {
+	command_and_args="$@" &&
+	command=$1 &&
+	if test ${command} = "branch" || test ${command} = "for-each-ref" || test ${command} = "tag"
+	then
+		atoms="(contents:subject) (contents:body) (contents)"
+	elif test ${command} = "log" || test ${command} = "show"
+	then
+		atoms="s b B"
+	fi &&
+	files="subject body message" &&
+	while test -n "${atoms}"
+	do
+		set ${atoms} && atom=$1 && shift && atoms="$*" &&
+		set ${files} &&	file=$1 && shift && files="$*" &&
+		test_expect_success "${command}: --format='%${atom}' works with messages using CRLF" "
+			rm -f expect &&
+			for ref in ${LIB_CRLF_BRANCHES}
+			do
+				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
+				printf \"\n\" >>expect
+			done &&
+			git $command_and_args --format=\"%${atom}\" >actual &&
+			test_cmp expect actual
+		"
+	done
+}
+
+
+test_expect_success 'Setup refs with commit and tag messages using CRLF' '
+	test_commit inital &&
+	create_crlf_refs
+'
+
+test_expect_success 'branch: --verbose works with messages using CRLF' '
+	rm -f expect &&
+	for branch in $LIB_CRLF_BRANCHES
+	do
+		printf "  " >>expect &&
+		cat .crlf-subject-${branch}.txt >>expect &&
+		printf "\n" >>expect
+	done &&
+	git branch -v >tmp &&
+	# Remove first two columns, and the line for the currently checked out branch
+	current=$(git branch --show-current) &&
+	grep -v $current <tmp | awk "{\$1=\$2=\"\"}1"  >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 2/2] log, show: add tests for messages containing CRLF
  2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
@ 2020-10-22  3:01       ` Philippe Blain via GitGitGadget
  2020-10-22 19:24         ` Philippe Blain
  2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-22  3:01 UTC (permalink / raw)
  To: git
  Cc: Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine, Philippe Blain,
	Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A previous commit fixed a bug in ref-filter.c causing messages
containing CRLF to be incorrectly parsed and displayed.

Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/t3920-crlf-messages.sh | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 3f0ce02c3f..b6e09be412 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -105,4 +105,22 @@ test_crlf_subject_body_and_contents tag --list tag-crlf*
 
 test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
 
+test_expect_success 'log: --oneline works with messages using CRLF' '
+	for branch in $LIB_CRLF_BRANCHES
+	do
+		cat .crlf-subject-${branch}.txt >expect &&
+		printf "\n" >>expect &&
+		git log --oneline -1 ${branch} >tmp-branch &&
+		git log --oneline -1 tag-${branch} >tmp-tag &&
+		cut -d" " -f2- <tmp-branch >actual-branch &&
+		cut -d" " -f2- <tmp-tag >actual-tag &&
+		test_cmp expect actual-branch &&
+		test_cmp expect actual-tag
+	done
+'
+
+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
+
+test_crlf_subject_body_and_contents show $LIB_CRLF_BRANCHES
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF
  2020-10-12 22:47       ` Eric Sunshine
  2020-10-14 13:20         ` Philippe Blain
@ 2020-10-22  3:09         ` Philippe Blain
  1 sibling, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-10-22  3:09 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie

Hi Eric,

>> +cleanup_crlf_refs () {
>> +	for branch in ${LIB_CRLF_BRANCHES}; do
> 
> Our style is to place 'do' on its own line:
> 
>    for branch in $LIB_CRLF_BRANCHES
>    do
>        ...
> 
> This would be a syntax error if LIB_CRLF_BRANCHES is empty for some
> reason, but I suppose we don't really have to worry about it here(?).

Apparently, not in my shell, as I realized I had misspelled LIB_CRLF_BRANCHES as
LIB_CLRF_BRANCHES (CLRF instead of CRLF) at another place and the test was
passing correctly (the loop was not being entered at all though):

$ /bin/sh --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin15)
$ for a in $b; do echo hello; done; echo $?
0

I've fixed that in v4 which I just sent. 

Cheers,
Philippe.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/2] log, show: add tests for messages containing CRLF
  2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-10-22 19:24         ` Philippe Blain
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-10-22 19:24 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: Git mailing list, Michael J Gruber, Matthieu Moy, John Keeping,
	Karthik Nayak, Jeff King, Alex Henrie, Eric Sunshine


> Le 21 oct. 2020 à 23:01, Philippe Blain via GitGitGadget <gitgitgadget@gmail.com> a écrit :
> 
> From: Philippe Blain <levraiphilippeblain@gmail.com>
> 
> A previous commit fixed a bug in ref-filter.c causing messages
> containing CRLF to be incorrectly parsed and displayed.
> 
> Add tests to also check that `git log` and `git show` correctly handle
> such messages, to prevent futur regressions if these commands are
> refactored to use the ref-filter API.
> 
> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
> ---
> t/t3920-crlf-messages.sh | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
> 
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index 3f0ce02c3f..b6e09be412 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -105,4 +105,22 @@ test_crlf_subject_body_and_contents tag --list tag-crlf*
> 
> test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
> 
> +test_expect_success 'log: --oneline works with messages using CRLF' '
> +	for branch in $LIB_CRLF_BRANCHES
> +	do
> +		cat .crlf-subject-${branch}.txt >expect &&
> +		printf "\n" >>expect &&
> +		git log --oneline -1 ${branch} >tmp-branch &&
> +		git log --oneline -1 tag-${branch} >tmp-tag &&
> +		cut -d" " -f2- <tmp-branch >actual-branch &&
> +		cut -d" " -f2- <tmp-tag >actual-tag &&
> +		test_cmp expect actual-branch &&
> +		test_cmp expect actual-tag
> +	done

I just realized that I'm missing this in this patch :

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index b6e09be412..70ddce3a2e 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -115,7 +115,7 @@ test_expect_success 'log: --oneline works with messages using CRLF' '
 		cut -d" " -f2- <tmp-branch >actual-branch &&
 		cut -d" " -f2- <tmp-tag >actual-tag &&
 		test_cmp expect actual-branch &&
-		test_cmp expect actual-tag
+		test_cmp expect actual-tag || return 1
 	done
 '

I'll wait a few days for any further comments and re-send with this tweak.

Philippe.

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
@ 2020-10-23  0:52         ` Junio C Hamano
  2020-10-23  1:46           ` Philippe Blain
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-10-23  0:52 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine, Philippe Blain

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Philippe Blain <levraiphilippeblain@gmail.com>
>
> The ref-filter code does not correctly handle commit or tag messages
> that use CRLF as the line terminator. Such messages can be created with
> the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
> using `git commit-tree` directly.
>
> The function `find_subpos` in ref-filter.c looks for two consecutive
> LFs to find the end of the subject line, a sequence which is absent in
> messages using CRLF. This results in the whole message being parsed as
> the subject line (`%(contents:subject)`), and the body of the message
> (`%(contents:body)`) being empty.
>
> Moreover, in `copy_subject`, which wants to return the subject as a
> single line, '\n' is replaced by space, but '\r' is
> untouched.

Honestly, all of the above signal, at least to me, that these
objects are designed to use LF terminated lines and nothing else,
whether Windows or DOS existed in the same world or not.  There is
no such thing as commit objects that use CRLF as the line
terminator.  They are commit objects whose payload has CR at the end
of each and every line.  Just like there can be commit objects whose
payload has trailing SP on each line, or even has binary guck, these
things can be created using the "commit --cleanup=verbatim" command,
or the "hash-objects" command.  It does not mean it is encouraged to
create such objects.  It does not mean it is sensible to expect them
to behave as if these trailing whitespaces (be it SP or CR) are not
there.

> This impacts the output of `git branch`, `git tag` and `git
> for-each-ref`.

The answer to that problem description is "then don't" ;-).  If you
do not want to have trailing whitespaces, you need to clean them up
somehow, and we give an easy way to do so with the default --cleanup
action.  Setting it to verbatim is to decline that easy way offered
to you, and it makes it your responsibility to do your own clean-up
if you still want to remove the CR at the end of your lines.

Having said all that.

Here is how I explained the topic in the "What's cooking" report.

     A commit and tag object may have CR at the end of each and
     every line (you can create such an object with hash-object or
     using --cleanup=verbatim to decline the default clean-up
     action), but it would make it impossible to have a blank line
     to separate the title from the body of the message.  Be lenient
     and accept a line with lone CR on it as a blank line, too.

Let's not call this change a "bug fix".  The phrase you used in your
title, "more gracefully", is a very good one.

In the meantime, I've squashed your "oops forgot ||return 1" change
into [PATCH 2/2].

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-23  0:52         ` Junio C Hamano
@ 2020-10-23  1:46           ` Philippe Blain
  2020-10-28 20:24             ` Junio C Hamano
  0 siblings, 1 reply; 38+ messages in thread
From: Philippe Blain @ 2020-10-23  1:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine

Hi Junio,

> Le 22 oct. 2020 à 20:52, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>> 
>> The ref-filter code does not correctly handle commit or tag messages
>> that use CRLF as the line terminator. Such messages can be created with
>> the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
>> using `git commit-tree` directly.
>> 
>> The function `find_subpos` in ref-filter.c looks for two consecutive
>> LFs to find the end of the subject line, a sequence which is absent in
>> messages using CRLF. This results in the whole message being parsed as
>> the subject line (`%(contents:subject)`), and the body of the message
>> (`%(contents:body)`) being empty.
>> 
>> Moreover, in `copy_subject`, which wants to return the subject as a
>> single line, '\n' is replaced by space, but '\r' is
>> untouched.
> 
> Honestly, all of the above signal, at least to me, that these
> objects are designed to use LF terminated lines and nothing else,
> whether Windows or DOS existed in the same world or not.  There is
> no such thing as commit objects that use CRLF as the line
> terminator.  They are commit objects whose payload has CR at the end
> of each and every line.  Just like there can be commit objects whose
> payload has trailing SP on each line, or even has binary guck, these
> things can be created using the "commit --cleanup=verbatim" command,
> or the "hash-objects" command.  It does not mean it is encouraged to
> create such objects.  It does not mean it is sensible to expect them
> to behave as if these trailing whitespaces (be it SP or CR) are not
> there.
> 
>> This impacts the output of `git branch`, `git tag` and `git
>> for-each-ref`.
> 
> The answer to that problem description is "then don't" ;-).  If you
> do not want to have trailing whitespaces, you need to clean them up
> somehow, and we give an easy way to do so with the default --cleanup
> action.  Setting it to verbatim is to decline that easy way offered
> to you, and it makes it your responsibility to do your own clean-up
> if you still want to remove the CR at the end of your lines.

I agree with you on that : if you are creating the object yourself,
you should let the default cleanup take place.

But as a lot of projects use GitHub, GitLab or similar services
to accept contributions, and let these web systems perform the "merge" 
(or rebase or whatever) operation to integrate these contributions;
maintainers sometime choose to not always have complete control
on all objects that become part of the canonical history of their repository. 
And as I wrote in [1], GitLab was creating commits using CRLF up until 9.2... [2].
So for these poor projects that are now stuck with these CRLFs in their
merge commit messages, I think it's good that Git handles these correctly.

> Having said all that.
> 
> Here is how I explained the topic in the "What's cooking" report.
> 
>     A commit and tag object may have CR at the end of each and
>     every line (you can create such an object with hash-object or
>     using --cleanup=verbatim to decline the default clean-up
>     action), but it would make it impossible to have a blank line
>     to separate the title from the body of the message.  Be lenient
>     and accept a line with lone CR on it as a blank line, too.

Just for the sake of searchability, I think it would be good to have 
CRLF spelled out in this topic description (since I gather this is what 
ends up in the release notes). But I don't feel that strongly
about that.

> Let's not call this change a "bug fix".  The phrase you used in your
> title, "more gracefully", is a very good one.

It was your suggestion ;) 

> In the meantime, I've squashed your "oops forgot ||return 1" change
> into [PATCH 2/2].

Thanks for squashing it in.

Cheers,
Philippe.


[1] https://lore.kernel.org/git/63755050-10A5-4A46-9BB3-8207E055692C@gmail.com/
[2] https://gitlab.com/gitlab-org/gitlab-foss/-/issues/31671

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-23  1:46           ` Philippe Blain
@ 2020-10-28 20:24             ` Junio C Hamano
  2020-10-29  1:29               ` Philippe Blain
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-10-28 20:24 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine

Philippe Blain <levraiphilippeblain@gmail.com> writes:

>> Having said all that.
>> 
>> Here is how I explained the topic in the "What's cooking" report.
>> 
>>     A commit and tag object may have CR at the end of each and
>>     every line (you can create such an object with hash-object or
>>     using --cleanup=verbatim to decline the default clean-up
>>     action), but it would make it impossible to have a blank line
>>     to separate the title from the body of the message.  Be lenient
>>     and accept a line with lone CR on it as a blank line, too.
>
> Just for the sake of searchability, I think it would be good to have 
> CRLF spelled out in this topic description (since I gather this is what 
> ends up in the release notes). But I don't feel that strongly
> about that.
>
>> Let's not call this change a "bug fix".  The phrase you used in your
>> title, "more gracefully", is a very good one.
>
> It was your suggestion ;) 
>
>> In the meantime, I've squashed your "oops forgot ||return 1" change
>> into [PATCH 2/2].
>
> Thanks for squashing it in.

Squashing in the "oops forgot || return 1" was the only thing I did.
I did not rewrite (and will not do so myself) the proposed log
message 1/2, which needs to happen before the topic can hit 'next'.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-28 20:24             ` Junio C Hamano
@ 2020-10-29  1:29               ` Philippe Blain
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Blain @ 2020-10-29  1:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain via GitGitGadget, Git mailing list,
	Michael J Gruber, Matthieu Moy, John Keeping, Karthik Nayak,
	Jeff King, Alex Henrie, Eric Sunshine

Hi Junio,

> Le 28 oct. 2020 à 16:24, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> Philippe Blain <levraiphilippeblain@gmail.com> writes:
> 
>>> Having said all that.
>>> 
>>> Here is how I explained the topic in the "What's cooking" report.
>>> 
>>>    A commit and tag object may have CR at the end of each and
>>>    every line (you can create such an object with hash-object or
>>>    using --cleanup=verbatim to decline the default clean-up
>>>    action), but it would make it impossible to have a blank line
>>>    to separate the title from the body of the message.  Be lenient
>>>    and accept a line with lone CR on it as a blank line, too.
>> 
>> Just for the sake of searchability, I think it would be good to have 
>> CRLF spelled out in this topic description (since I gather this is what 
>> ends up in the release notes). But I don't feel that strongly
>> about that.
>> 
>>> Let's not call this change a "bug fix".  The phrase you used in your
>>> title, "more gracefully", is a very good one.
>> 
>> It was your suggestion ;) 
>> 
>>> In the meantime, I've squashed your "oops forgot ||return 1" change
>>> into [PATCH 2/2].
>> 
>> Thanks for squashing it in.
> 
> Squashing in the "oops forgot || return 1" was the only thing I did.
> I did not rewrite (and will not do so myself) the proposed log
> message 1/2, which needs to happen before the topic can hit 'next'.

Ah! I thought you meant "Let's not call it a bug fix in the release notes".
I'll send a new version to not mention "bug" in the log messages.

Thanks,

Philippe.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
  2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
@ 2020-10-29 12:48       ` Philippe Blain via GitGitGadget
  2020-10-29 12:48         ` [PATCH v5 1/2] " Philippe Blain via GitGitGadget
  2020-10-29 12:48         ` [PATCH v5 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
  2 siblings, 2 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-29 12:48 UTC (permalink / raw)
  To: git; +Cc: Philippe Blain, Philippe Blain

Changes since v4 :

 * tweaked the wording of both commit messages to not call the behaviour a
   bug
 * added || return 1 in one test - that had already been squashed in by
   Junio


----------------------------------------------------------------------------

The ref-filter code does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
--cleanup=verbatim option of git commit and git tag, or by using git
commit-tree directly.

The function find_subpos in ref-filter.c looks for two consecutive '\n' to
find the end of the subject line, a sequence which is absent in messages
using CRLF. This results in the whole message being parsed as the subject
line (%(contents:subject)), and the body of the message (%(contents:body))
being empty.

Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched.

This behaviour is a regression for git branch --verbose, which bisects down
to 949af06 (branch: use ref-filter printing APIs, 2017-01-10).

The first patch hardens the code in ref-filter.c to handle these messages
more gracefully and adds a test script to check that the ref-filter and
pretty APIs deal correctly with CRLF messages.

The second patch adds tests that check the behaviour of git log andgit show 
in the presence of CRLF in messages, to prevent futur regressions.

Cc: Michael J Gruber git@grubix.eu [git@grubix.eu], Matthieu Moy 
git@matthieu-moy.fr [git@matthieu-moy.fr], John Keeping john@keeping.me.uk
[john@keeping.me.uk], Karthik Nayak karthik.188@gmail.com
[karthik.188@gmail.com], Jeff King peff@peff.net [peff@peff.net], Alex
Henrie alexhenrie24@gmail.com [alexhenrie24@gmail.com]cc: Eric Sunshine 
sunshine@sunshineco.com [sunshine@sunshineco.com]cc: Junio C Hamano 
gitster@pobox.com [gitster@pobox.com]

Philippe Blain (2):
  ref-filter: handle CRLF at end-of-line more gracefully
  log, show: add tests for messages containing CRLF

 ref-filter.c             |  36 ++++++-----
 t/t3920-crlf-messages.sh | 126 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 14 deletions(-)
 create mode 100755 t/t3920-crlf-messages.sh


base-commit: a5fa49ff0a8f3252c6bff49f92b85e7683868f8a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-576%2Fphil-blain%2Ffix-branch-verbose-crlf-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-576/phil-blain/fix-branch-verbose-crlf-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/576

Range-diff vs v4:

 1:  03b2d7d78a ! 1:  06231c315f ref-filter: handle CRLF at end-of-line more gracefully
     @@ Commit message
          This impacts the output of `git branch`, `git tag` and `git
          for-each-ref`.
      
     -    This bug is a regression for `git branch --verbose`, which
     +    This behaviour is a regression for `git branch --verbose`, which
          bisects down to 949af0684c (branch: use ref-filter printing APIs,
          2017-01-10).
      
     -    Fix this bug in ref-filter by hardening the logic in `copy_subject` and
     -    `find_subpos` to correctly parse messages containing CRLF.
     +    Adjust the ref-filter code to be more lenient by hardening the logic in
     +    `copy_subject` and `find_subpos` to correctly parse messages containing
     +    CRLF.
      
          Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
          of commands using either the ref-filter or the pretty APIs with messages
 2:  75a87887be ! 2:  f536fee695 log, show: add tests for messages containing CRLF
     @@ Metadata
       ## Commit message ##
          log, show: add tests for messages containing CRLF
      
     -    A previous commit fixed a bug in ref-filter.c causing messages
     -    containing CRLF to be incorrectly parsed and displayed.
     +    A previous commit adjusted the code in ref-filter.c so that messages
     +    containing CRLF are now correctly parsed and displayed.
      
          Add tests to also check that `git log` and `git show` correctly handle
          such messages, to prevent futur regressions if these commands are
     @@ t/t3920-crlf-messages.sh: test_crlf_subject_body_and_contents tag --list tag-crl
      +		cut -d" " -f2- <tmp-branch >actual-branch &&
      +		cut -d" " -f2- <tmp-tag >actual-tag &&
      +		test_cmp expect actual-branch &&
     -+		test_cmp expect actual-tag
     ++		test_cmp expect actual-tag || return 1
      +	done
      +'
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 1/2] ref-filter: handle CRLF at end-of-line more gracefully
  2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
@ 2020-10-29 12:48         ` Philippe Blain via GitGitGadget
  2020-10-29 12:48         ` [PATCH v5 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
  1 sibling, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-29 12:48 UTC (permalink / raw)
  To: git; +Cc: Philippe Blain, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter code does not correctly handle commit or tag messages
that use CRLF as the line terminator. Such messages can be created with
the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
using `git commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`) being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This behaviour is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Adjust the ref-filter code to be more lenient by hardening the logic in
`copy_subject` and `find_subpos` to correctly parse messages containing
CRLF.

Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
of commands using either the ref-filter or the pretty APIs with messages
using CRLF line endings. The function `test_crlf_subject_body_and_contents`
can be used to test that the `--format` option of `branch`, `tag`,
`for-each-ref`, `log` and `show` correctly displays the subject, body
and raw content of commit and tag messages using CRLF. Test the
output of `branch`, `tag` and `for-each-ref` with such commits.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             |  36 ++++++++-----
 t/t3920-crlf-messages.sh | 108 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 14 deletions(-)
 create mode 100755 t/t3920-crlf-messages.sh

diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..6476686fea 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1097,14 +1097,19 @@ static const char *copy_email(const char *buf, struct used_atom *atom)
 
 static char *copy_subject(const char *buf, unsigned long len)
 {
-	char *r = xmemdupz(buf, len);
+	struct strbuf sb = STRBUF_INIT;
 	int i;
 
-	for (i = 0; i < len; i++)
-		if (r[i] == '\n')
-			r[i] = ' ';
+	for (i = 0; i < len; i++) {
+		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
+			continue; /* ignore CR in CRLF */
 
-	return r;
+		if (buf[i] == '\n')
+			strbuf_addch(&sb, ' ');
+		else
+			strbuf_addch(&sb, buf[i]);
+	}
+	return strbuf_detach(&sb, NULL);
 }
 
 static void grab_date(const char *buf, struct atom_value *v, const char *atomname)
@@ -1228,20 +1233,23 @@ static void find_subpos(const char *buf,
 
 	/* subject is first non-empty line */
 	*sub = buf;
-	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
-		buf = eol;
-	}
+	/* subject goes to first empty line before signature begins */
+	if ((eol = strstr(*sub, "\n\n"))) {
+		eol = eol < *sig ? eol : *sig;
+	/* check if message uses CRLF */
+	} else if (! (eol = strstr(*sub, "\r\n\r\n"))) {
+		/* treat whole message as subject */
+		eol = strrchr(*sub, '\0');
+	}
+	buf = eol;
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	while (*sublen && ((*sub)[*sublen - 1] == '\n' ||
+			   (*sub)[*sublen - 1] == '\r'))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
new file mode 100755
index 0000000000..3f0ce02c3f
--- /dev/null
+++ b/t/t3920-crlf-messages.sh
@@ -0,0 +1,108 @@
+#!/bin/sh
+
+test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+. ./test-lib.sh
+
+LIB_CRLF_BRANCHES=""
+
+create_crlf_ref () {
+	branch="$1" &&
+	cat >.crlf-orig-$branch.txt &&
+	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
+	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
+	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
+	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
+	test_tick &&
+	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
+	git branch ${branch} ${hash} &&
+	git tag tag-${branch} ${branch} -F .crlf-message-${branch}.txt --cleanup=verbatim
+}
+
+create_crlf_refs () {
+	create_crlf_ref crlf <<-\EOF &&
+	Subject first line
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-empty-lines-after-subject <<-\EOF &&
+	Subject first line
+
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-two-line-subject <<-\EOF &&
+	Subject first line
+	Subject second line
+
+	Body first line
+	Body second line
+	EOF
+	create_crlf_ref crlf-two-line-subject-no-body <<-\EOF &&
+	Subject first line
+	Subject second line
+	EOF
+	create_crlf_ref crlf-two-line-subject-no-body-trailing-newline <<-\EOF
+	Subject first line
+	Subject second line
+
+	EOF
+}
+
+test_crlf_subject_body_and_contents() {
+	command_and_args="$@" &&
+	command=$1 &&
+	if test ${command} = "branch" || test ${command} = "for-each-ref" || test ${command} = "tag"
+	then
+		atoms="(contents:subject) (contents:body) (contents)"
+	elif test ${command} = "log" || test ${command} = "show"
+	then
+		atoms="s b B"
+	fi &&
+	files="subject body message" &&
+	while test -n "${atoms}"
+	do
+		set ${atoms} && atom=$1 && shift && atoms="$*" &&
+		set ${files} &&	file=$1 && shift && files="$*" &&
+		test_expect_success "${command}: --format='%${atom}' works with messages using CRLF" "
+			rm -f expect &&
+			for ref in ${LIB_CRLF_BRANCHES}
+			do
+				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
+				printf \"\n\" >>expect
+			done &&
+			git $command_and_args --format=\"%${atom}\" >actual &&
+			test_cmp expect actual
+		"
+	done
+}
+
+
+test_expect_success 'Setup refs with commit and tag messages using CRLF' '
+	test_commit inital &&
+	create_crlf_refs
+'
+
+test_expect_success 'branch: --verbose works with messages using CRLF' '
+	rm -f expect &&
+	for branch in $LIB_CRLF_BRANCHES
+	do
+		printf "  " >>expect &&
+		cat .crlf-subject-${branch}.txt >>expect &&
+		printf "\n" >>expect
+	done &&
+	git branch -v >tmp &&
+	# Remove first two columns, and the line for the currently checked out branch
+	current=$(git branch --show-current) &&
+	grep -v $current <tmp | awk "{\$1=\$2=\"\"}1"  >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 2/2] log, show: add tests for messages containing CRLF
  2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
  2020-10-29 12:48         ` [PATCH v5 1/2] " Philippe Blain via GitGitGadget
@ 2020-10-29 12:48         ` Philippe Blain via GitGitGadget
  1 sibling, 0 replies; 38+ messages in thread
From: Philippe Blain via GitGitGadget @ 2020-10-29 12:48 UTC (permalink / raw)
  To: git; +Cc: Philippe Blain, Philippe Blain, Philippe Blain

From: Philippe Blain <levraiphilippeblain@gmail.com>

A previous commit adjusted the code in ref-filter.c so that messages
containing CRLF are now correctly parsed and displayed.

Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 t/t3920-crlf-messages.sh | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 3f0ce02c3f..70ddce3a2e 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -105,4 +105,22 @@ test_crlf_subject_body_and_contents tag --list tag-crlf*
 
 test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
 
+test_expect_success 'log: --oneline works with messages using CRLF' '
+	for branch in $LIB_CRLF_BRANCHES
+	do
+		cat .crlf-subject-${branch}.txt >expect &&
+		printf "\n" >>expect &&
+		git log --oneline -1 ${branch} >tmp-branch &&
+		git log --oneline -1 tag-${branch} >tmp-tag &&
+		cut -d" " -f2- <tmp-branch >actual-branch &&
+		cut -d" " -f2- <tmp-tag >actual-tag &&
+		test_cmp expect actual-branch &&
+		test_cmp expect actual-tag || return 1
+	done
+'
+
+test_crlf_subject_body_and_contents log --all --reverse --grep Subject
+
+test_crlf_subject_body_and_contents show $LIB_CRLF_BRANCHES
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-10-29 12:48 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10  2:19   ` Philippe Blain
2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-10 17:50     ` Junio C Hamano
2020-03-10  2:24   ` [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-10  3:31   ` [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10 17:24   ` Junio C Hamano
2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-12 22:22       ` Junio C Hamano
2020-10-14 13:22         ` Philippe Blain
2020-10-12 22:47       ` Eric Sunshine
2020-10-14 13:20         ` Philippe Blain
2020-10-14 13:45           ` Eric Sunshine
2020-10-14 13:52             ` Philippe Blain
2020-10-14 23:01               ` Eric Sunshine
2020-10-22  3:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 22:24       ` Junio C Hamano
2020-10-14 13:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 3/3] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
2020-10-23  0:52         ` Junio C Hamano
2020-10-23  1:46           ` Philippe Blain
2020-10-28 20:24             ` Junio C Hamano
2020-10-29  1:29               ` Philippe Blain
2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22 19:24         ` Philippe Blain
2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 1/2] " Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).