git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Sunshine <sunshine@sunshineco.com>
To: git@vger.kernel.org
Cc: Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: [PATCH 1/5] chainlint: match arbitrary here-docs tags rather than hard-coded names
Date: Tue,  7 Aug 2018 04:21:31 -0400	[thread overview]
Message-ID: <20180807082135.60913-2-sunshine@sunshineco.com> (raw)
In-Reply-To: <20180807082135.60913-1-sunshine@sunshineco.com>

chainlint.sed swallows top-level here-docs to avoid being fooled by
content which might look like start-of-subshell. It likewise swallows
here-docs in subshells to avoid marking content lines as breaking the
&&-chain, and to avoid being fooled by content which might look like
end-of-subshell, start-of-nested-subshell, or other specially-recognized
constructs.

At the time of implementation, it was believed that it was not possible
to support arbitrary here-doc tag names since 'sed' provides no way to
stash the opening tag name in a variable for later comparison against a
line signaling end-of-here-doc. Consequently, tag names are hard-coded,
with "EOF" being the only tag recognized at the top-level, and only
"EOF", "EOT", and "INPUT_END" being recognized within subshells. Also,
special care was taken to avoid being confused by here-docs nested
within other here-docs.

In practice, this limited number of hard-coded tag names has been "good
enough" for the 13000+ existing Git test, despite many of those tests
using tags other than the recognized ones, since the bodies of those
here-docs do not contain content which would fool the linter.
Nevertheless, the situation is not ideal since someone writing new
tests, and choosing a name not in the "blessed" set could potentially
trigger a false-positive.

To address this shortcoming, upgrade chainlint.sed to handle arbitrary
here-doc tag names, both at the top-level and within subshells.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.sed                      | 57 +++++++++++++++++-----------
 t/chainlint/here-doc.expect          |  2 +
 t/chainlint/here-doc.test            |  7 ++++
 t/chainlint/nested-here-doc.expect   |  2 +
 t/chainlint/nested-here-doc.test     | 10 +++++
 t/chainlint/subshell-here-doc.expect |  4 ++
 t/chainlint/subshell-here-doc.test   |  8 ++++
 7 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/t/chainlint.sed b/t/chainlint.sed
index 5f0882cb38..bd76c5d181 100644
--- a/t/chainlint.sed
+++ b/t/chainlint.sed
@@ -61,6 +61,22 @@
 # "else", and "fi" in if-then-else likewise must not end with "&&", thus
 # receives similar treatment.
 #
+# Swallowing here-docs with arbitrary tags requires a bit of finesse. When a
+# line such as "cat <<EOF >out" is seen, the here-doc tag is moved to the front
+# of the line enclosed in angle brackets as a sentinel, giving "<EOF>cat >out".
+# As each subsequent line is read, it is appended to the target line and a
+# (whitespace-loose) back-reference match /^<(.*)>\n\1$/ is attempted to see if
+# the content inside "<...>" matches the entirety of the newly-read line. For
+# instance, if the next line read is "some data", when concatenated with the
+# target line, it becomes "<EOF>cat >out\nsome data", and a match is attempted
+# to see if "EOF" matches "some data". Since it doesn't, the next line is
+# attempted. When a line consisting of only "EOF" (and possible whitespace) is
+# encountered, it is appended to the target line giving "<EOF>cat >out\nEOF",
+# in which case the "EOF" inside "<...>" does match the text following the
+# newline, thus the closing here-doc tag has been found. The closing tag line
+# and the "<...>" prefix on the target line are then discarded, leaving just
+# the target line "cat >out".
+#
 # To facilitate regression testing (and manual debugging), a ">" annotation is
 # applied to the line containing ")" which closes a subshell, ">>" to a line
 # closing a nested subshell, and ">>>" to a line closing both at once. This
@@ -78,14 +94,17 @@
 
 # here-doc -- swallow it to avoid false hits within its body (but keep the
 # command to which it was attached)
-/<<[ 	]*[-\\]*EOF[ 	]*/ {
-	s/[ 	]*<<[ 	]*[-\\]*EOF//
-	h
+/<<[ 	]*[-\\]*[A-Z0-9_][A-Z0-9_]*/ {
+	s/^\(.*\)<<[ 	]*[-\\]*\([A-Z0-9_][A-Z0-9_]*\)/<\2>\1<</
+	s/[ 	]*<<//
 	:hereslurp
 	N
-	s/.*\n//
-	/^[ 	]*EOF[ 	]*$/!bhereslurp
-	x
+	/^<\([^>]*\)>.*\n[ 	]*\1[ 	]*$/!{
+		s/\n.*$//
+		bhereslurp
+	}
+	s/^<[^>]*>//
+	s/\n.*$//
 }
 
 # one-liner "(...) &&"
@@ -139,9 +158,7 @@ s/.*\n//
 	/"[^'"]*'[^'"]*"/!bsqstring
 }
 # here-doc -- swallow it
-/<<[ 	]*[-\\]*EOF/bheredoc
-/<<[ 	]*[-\\]*EOT/bheredoc
-/<<[ 	]*[-\\]*INPUT_END/bheredoc
+/<<[ 	]*[-\\]*[A-Z0-9_][A-Z0-9_]*/bheredoc
 # comment or empty line -- discard since final non-comment, non-empty line
 # before closing ")", "done", "elsif", "else", or "fi" will need to be
 # re-visited to drop "suspect" marking since final line of those constructs
@@ -249,23 +266,17 @@ s/\n//
 bcheckchain
 
 # found here-doc -- swallow it to avoid false hits within its body (but keep
-# the command to which it was attached); take care to handle here-docs nested
-# within here-docs by only recognizing closing tag matching outer here-doc
-# opening tag
+# the command to which it was attached)
 :heredoc
-/EOF/{ s/[ 	]*<<[ 	]*[-\\]*EOF//; s/^/EOF/; }
-/EOT/{ s/[ 	]*<<[ 	]*[-\\]*EOT//; s/^/EOT/; }
-/INPUT_END/{ s/[ 	]*<<[ 	]*[-\\]*INPUT_END//; s/^/INPUT_END/; }
+s/^\(.*\)<<[ 	]*[-\\]*\([A-Z0-9_][A-Z0-9_]*\)/<\2>\1<</
+s/[ 	]*<<//
 :hereslurpsub
 N
-/^EOF.*\n[ 	]*EOF[ 	]*$/bhereclose
-/^EOT.*\n[ 	]*EOT[ 	]*$/bhereclose
-/^INPUT_END.*\n[ 	]*INPUT_END[ 	]*$/bhereclose
-bhereslurpsub
-:hereclose
-s/^EOF//
-s/^EOT//
-s/^INPUT_END//
+/^<\([^>]*\)>.*\n[ 	]*\1[ 	]*$/!{
+	s/\n.*$//
+	bhereslurpsub
+}
+s/^<[^>]*>//
 s/\n.*$//
 bcheckchain
 
diff --git a/t/chainlint/here-doc.expect b/t/chainlint/here-doc.expect
index 2328fe7753..33bc3cc0b4 100644
--- a/t/chainlint/here-doc.expect
+++ b/t/chainlint/here-doc.expect
@@ -1,3 +1,5 @@
 boodle wobba        gorgo snoot        wafta snurb &&
 
+cat >foo &&
+
 horticulture
diff --git a/t/chainlint/here-doc.test b/t/chainlint/here-doc.test
index bd36f6e1d3..3736fd2cb4 100644
--- a/t/chainlint/here-doc.test
+++ b/t/chainlint/here-doc.test
@@ -7,6 +7,13 @@ quoth the raven,
 nevermore...
 EOF
 
+# LINT: swallow here-doc with arbitrary tag
+cat <<-ARBITRARY >foo &&
+snoz
+boz
+woz
+ARBITRARY
+
 # LINT: swallow here-doc (EOF is last line of test)
 horticulture <<\EOF
 gomez
diff --git a/t/chainlint/nested-here-doc.expect b/t/chainlint/nested-here-doc.expect
index 559301e005..0c9ef1cfc6 100644
--- a/t/chainlint/nested-here-doc.expect
+++ b/t/chainlint/nested-here-doc.expect
@@ -1,3 +1,5 @@
+cat >foop &&
+
 (
 	cat &&
 ?!AMP?!	cat
diff --git a/t/chainlint/nested-here-doc.test b/t/chainlint/nested-here-doc.test
index 027e0bb3ff..f35404bf0f 100644
--- a/t/chainlint/nested-here-doc.test
+++ b/t/chainlint/nested-here-doc.test
@@ -1,3 +1,13 @@
+# LINT: inner "EOF" not misintrepreted as closing ARBITRARY here-doc
+cat <<ARBITRARY >foop &&
+naddle
+fub <<EOF
+	nozzle
+	noodle
+EOF
+formp
+ARBITRARY
+
 (
 # LINT: inner "EOF" not misintrepreted as closing INPUT_END here-doc
 	cat <<-\INPUT_END &&
diff --git a/t/chainlint/subshell-here-doc.expect b/t/chainlint/subshell-here-doc.expect
index 19d5aff233..7c2da63bc7 100644
--- a/t/chainlint/subshell-here-doc.expect
+++ b/t/chainlint/subshell-here-doc.expect
@@ -2,4 +2,8 @@
 	echo wobba 	       gorgo snoot 	       wafta snurb &&
 ?!AMP?!	cat >bip
 	echo >bop
+>) &&
+(
+	cat >bup &&
+	meep
 >)
diff --git a/t/chainlint/subshell-here-doc.test b/t/chainlint/subshell-here-doc.test
index 9c3564c247..05139af0b5 100644
--- a/t/chainlint/subshell-here-doc.test
+++ b/t/chainlint/subshell-here-doc.test
@@ -20,4 +20,12 @@
 	wednesday
 	pugsly
 	EOF
+) &&
+(
+# LINT: swallow here-doc with arbitrary tag
+	cat <<-\ARBITRARY >bup &&
+	glink
+	FIZZ
+	ARBITRARY
+	meep
 )
-- 
2.18.0.758.g1932418f46


  reply	other threads:[~2018-08-07  8:22 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-26  7:29 [PATCH 00/29] t: detect and fix broken &&-chains in subshells Eric Sunshine
2018-06-26  7:29 ` [PATCH 01/29] t7508: use test_when_finished() instead of managing exit code manually Eric Sunshine
2018-06-26  7:29 ` [PATCH 02/29] t0001: use "{...}" block around "||" expression rather than subshell Eric Sunshine
2018-06-26  7:29 ` [PATCH 03/29] t1300: use sane_unset() to avoid breaking &&-chain Eric Sunshine
2018-06-26  7:29 ` [PATCH 04/29] t3303: use standard here-doc tag "EOF" to avoid fooling --chain-lint Eric Sunshine
2018-06-26  7:29 ` [PATCH 05/29] t5505: modernize and simplify hard-to-digest test Eric Sunshine
2018-06-26  7:29 ` [PATCH 06/29] t6036: fix broken "merge fails but has appropriate contents" tests Eric Sunshine
2018-06-26  8:44   ` Elijah Newren
2018-06-26  7:29 ` [PATCH 07/29] t7201: drop pointless "exit 0" at end of subshell Eric Sunshine
2018-06-26  7:29 ` [PATCH 08/29] t7400: fix broken "submodule add/reconfigure --force" test Eric Sunshine
2018-06-27 18:04   ` Stefan Beller
2018-06-26  7:29 ` [PATCH 09/29] t7810: use test_expect_code() instead of hand-rolled comparison Eric Sunshine
2018-06-26  7:29 ` [PATCH 10/29] t9001: fix broken "invoke hook" test Eric Sunshine
2018-06-26 17:07   ` Jonathan Tan
2018-06-26  7:29 ` [PATCH 11/29] t9104: use "{...}" block around "||" expression rather than subshell Eric Sunshine
2018-06-26  7:29 ` [PATCH 12/29] t9401: drop unnecessary nested subshell Eric Sunshine
2018-06-26  7:29 ` [PATCH 13/29] t/lib-submodule-update: fix broken "replace submodule must-fail" test Eric Sunshine
2018-06-27 18:30   ` [PATCH] t/lib-submodule-update: fix absorbing test Stefan Beller
2018-06-27 18:38     ` Eric Sunshine
2018-06-26  7:29 ` [PATCH 14/29] t: drop subshell with missing &&-chain in favor of simpler construct Eric Sunshine
2018-06-26 19:31   ` Junio C Hamano
2018-06-26 20:06     ` Eric Sunshine
2018-06-26  7:29 ` [PATCH 15/29] t: drop unnecessary terminating semicolons in subshell Eric Sunshine
2018-06-26  7:29 ` [PATCH 16/29] t: use test_might_fail() instead of manipulating exit code manually Eric Sunshine
2018-06-26  7:29 ` [PATCH 17/29] t: use test_must_fail() instead of checking " Eric Sunshine
2018-06-26  7:59   ` Luke Diamand
2018-06-26  8:58   ` Elijah Newren
2018-06-26  9:21     ` Eric Sunshine
2018-06-26 18:05       ` Johannes Sixt
2018-06-26 18:14         ` Eric Sunshine
2018-06-26 21:00           ` Johannes Sixt
2018-06-26  7:29 ` [PATCH 18/29] t0000-t0999: fix broken &&-chains in subshells Eric Sunshine
2018-06-26  7:29 ` [PATCH 19/29] t1000-t1999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 20/29] t2000-t2999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 21/29] t3000-t3999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 22/29] t3030: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 23/29] t4000-t4999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 24/29] t5000-t5999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 25/29] t6000-t6999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 26/29] t7000-t7999: " Eric Sunshine
2018-06-26  7:29 ` [PATCH 27/29] t9000-t9999: " Eric Sunshine
2018-06-26  7:30 ` [PATCH 28/29] t9119: " Eric Sunshine
2018-06-26  7:30 ` [PATCH 29/29] t/test-lib: teach --chain-lint to detect " Eric Sunshine
2018-06-26 19:15   ` Junio C Hamano
2018-06-26 19:52     ` Eric Sunshine
2018-06-26 20:17       ` Jeff King
2018-06-26 20:22         ` Jeff King
2018-06-26 20:59           ` Eric Sunshine
2018-06-26 21:33           ` Elijah Newren
2018-06-26 21:42             ` Eric Sunshine
2018-06-26 20:46         ` Eric Sunshine
2018-06-26 21:01           ` Jeff King
2018-06-26 21:13             ` Eric Sunshine
2018-06-28 14:35               ` Jeff King
2018-06-27  2:15             ` Elijah Newren
2018-06-27  6:27               ` Johannes Sixt
2018-06-27  6:48                 ` Eric Sunshine
2018-06-28 14:37               ` Jeff King
2018-06-26 21:09         ` Junio C Hamano
2018-06-26  9:20 ` [PATCH 00/29] t: detect and fix " Elijah Newren
2018-06-26  9:31   ` Eric Sunshine
2018-06-26 15:34     ` Elijah Newren
2018-06-26 19:38 ` Junio C Hamano
2018-06-26 21:25   ` Eric Sunshine
2018-06-26 22:31     ` Junio C Hamano
2018-06-27  0:22       ` Jonathan Nieder
2018-07-11  6:46 ` [PATCH v2 00/10] detect " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 01/10] t/test-lib: teach --chain-lint to " Eric Sunshine
2018-07-11 21:37     ` Junio C Hamano
2018-07-12 10:50       ` Eric Sunshine
2018-07-12 16:56         ` Jeff King
2018-07-12 19:32           ` Eric Sunshine
2018-07-12 19:54             ` Junio C Hamano
2018-07-30 18:13     ` Jonathan Nieder
2018-07-30 19:06       ` [PATCH 0/2] subtree: fix &&-chain and simplify tests (Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells) Jonathan Nieder
2018-07-30 19:07         ` [PATCH 1/2] subtree test: add missing && to &&-chain Jonathan Nieder
2018-07-30 19:07         ` [PATCH 2/2] subtree test: simplify preparation of expected results Jonathan Nieder
2018-07-30 20:25       ` [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells Eric Sunshine
2018-07-30 20:59         ` Jonathan Nieder
2018-07-30 21:38           ` Eric Sunshine
2018-07-31 12:50             ` Jeff King
2018-07-31 18:55               ` Eric Sunshine
2018-07-31 19:08                 ` Jeff King
2018-08-23 18:02     ` Ævar Arnfjörð Bjarmason
2018-08-23 18:27       ` Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 02/10] t/Makefile: add machinery to check correctness of chainlint.sed Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 03/10] t/chainlint: add chainlint "basic" test cases Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 04/10] t/chainlint: add chainlint "whitespace" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 05/10] t/chainlint: add chainlint "one-liner" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 06/10] t/chainlint: add chainlint "nested subshell" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 07/10] t/chainlint: add chainlint "loop" and "conditional" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 08/10] t/chainlint: add chainlint "cuddled" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 09/10] t/chainlint: add chainlint "complex" " Eric Sunshine
2018-07-11  6:46   ` [PATCH v2 10/10] t/chainlint: add chainlint "specialized" " Eric Sunshine
2018-08-07  8:21   ` [PATCH 0/5] chainlint: improve robustness against "unusual" shell coding Eric Sunshine
2018-08-07  8:21     ` Eric Sunshine [this message]
2018-08-08 22:50       ` [PATCH 1/5] chainlint: match arbitrary here-docs tags rather than hard-coded names Jeff King
2018-08-09  5:58         ` Eric Sunshine
2018-08-09 14:26           ` Jeff King
2018-08-07  8:21     ` [PATCH 2/5] chainlint: recognize multi-line $(...) when command cuddled with "$(" Eric Sunshine
2018-08-07  8:21     ` [PATCH 3/5] chainlint: let here-doc and multi-line string commence on same line Eric Sunshine
2018-08-07  8:21     ` [PATCH 4/5] chainlint: recognize multi-line quoted strings more robustly Eric Sunshine
2018-08-07  8:21     ` [PATCH 5/5] chainlint: add test of pathological case which triggered false positive Eric Sunshine
2018-08-08 22:53     ` [PATCH 0/5] chainlint: improve robustness against "unusual" shell coding Jeff King
2018-08-09  0:44       ` Junio C Hamano
2018-08-13  8:47     ` [PATCH v2 0/6] " Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 1/6] chainlint: match arbitrary here-docs tags rather than hard-coded names Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 2/6] chainlint: match 'quoted' here-doc tags Eric Sunshine
2018-08-13 19:27         ` Junio C Hamano
2018-08-13 20:12           ` Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 3/6] chainlint: recognize multi-line $(...) when command cuddled with "$(" Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 4/6] chainlint: let here-doc and multi-line string commence on same line Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 5/6] chainlint: recognize multi-line quoted strings more robustly Eric Sunshine
2018-08-13  8:47       ` [PATCH v2 6/6] chainlint: add test of pathological case which triggered false positive Eric Sunshine
2018-08-15 18:45       ` [PATCH v3 0/6] chainlint: improve robustness against "unusual" shell coding Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 1/6] chainlint: match arbitrary here-docs tags rather than hard-coded names Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 2/6] chainlint: match quoted here-doc tags Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 3/6] chainlint: recognize multi-line $(...) when command cuddled with "$(" Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 4/6] chainlint: let here-doc and multi-line string commence on same line Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 5/6] chainlint: recognize multi-line quoted strings more robustly Eric Sunshine
2018-08-15 18:45         ` [PATCH v3 6/6] chainlint: add test of pathological case which triggered false positive Eric Sunshine
2018-08-29  9:45         ` [PATCH] chainlint: match "quoted" here-doc tags Eric Sunshine
2018-08-29 17:57           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180807082135.60913-2-sunshine@sunshineco.com \
    --to=sunshine@sunshineco.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).