git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH 13/29] grep: add tests to fix blind spots with \0 patterns
Date: Thu, 11 May 2017 09:18:13 +0000	[thread overview]
Message-ID: <20170511091829.5634-14-avarab@gmail.com> (raw)
In-Reply-To: <20170511091829.5634-1-avarab@gmail.com>

Address a big blind spot in the tests for patterns containing \0. The
is_fixed() function considers any string that contains \0 fixed, even
if it contains regular expression metacharacters, those patterns are
currently matched with kwset.

Before this change removing that memchr(s, 0, len) check from
is_fixed() wouldn't change the result of any of the tests, since
regcomp() will happily match the part before the \0.

Furthermore, the kwset path is dependent on whether the the -i flag is
on, and whether the pattern has any non-ASCII characters, but none of
this was tested for.

See the a previous commit in this series ("grep: add tests to fix
blind spots with \0 patterns", 2017-04-21) for further details &
rationale.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7008-grep-binary.sh | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 1afba6cee9..ba3db06501 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -22,6 +22,18 @@ nul_match() {
 			printf '$pattern' | q_to_nul >f &&
 			test_must_fail git grep -f f $flags a
 		"
+	elif test $status = "T1"
+	then
+		test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			git grep -f f $flags a
+		"
+	elif test $status = "T0"
+	then
+		test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			test_must_fail git grep -f f $flags a
+		"
 	else
 		test_expect_success "PANIC: Test framework error. Unknown status $status" 'false'
 	fi
@@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf'
 nul_match 0 '-Fi' 'YQx'
 nul_match 1 '' 'yQf'
 nul_match 0 '' 'yQx'
+nul_match 1 '' 'æQð'
+nul_match 1 '-F' 'eQm[*]c'
+nul_match 1 '-Fi' 'EQM[*]C'
+
+# Regex patterns that would match but shouldn't with -F
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-F' '[y]Qf'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '-Fi' '[Y]QF'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-F' '[æ]Qð'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '-Fi' '[Æ]QÐ'
+
+# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0
+# patterns case-insensitively.
+nul_match T1 '-i' 'ÆQÐ'
+
+# \0 implicitly disables regexes. This is an undocumented internal
+# limitation.
+nul_match T1 '' 'yQ[f]'
+nul_match T1 '' '[y]Qf'
+nul_match T1 '-i' 'YQ[F]'
+nul_match T1 '-i' '[Y]Qf'
+nul_match T1 '' 'æQ[ð]'
+nul_match T1 '' '[æ]Qð'
+nul_match T1 '-i' 'ÆQ[Ð]'
+
+# ... because of \0 implicitly disabling regexes regexes that
+# should/shouldn't match don't do the right thing.
+nul_match T1 '' 'eQm.*cQ'
+nul_match T1 '-i' 'EQM.*cQ'
+nul_match T0 '' 'eQm[*]c'
+nul_match T0 '-i' 'EQM[*]C'
+
+# Due to the REG_STARTEND extension when kwset() is disabled on -i &
+# non-ASCII the string will be matched in its entirety, but the
+# pattern will be cut off at the first \0.
+nul_match 0 '-i' 'NOMATCHQð'
+nul_match T0 '-i' '[Æ]QNOMATCH'
+nul_match T0 '-i' '[æ]QNOMATCH'
+# Matches, but for the wrong reasons, just stops at [æ]
+nul_match 1 '-i' '[Æ]Qð'
+nul_match 1 '-i' '[æ]Qð'
+
+# Ensure that the matcher doesn't regress to something that stops at
+# \0
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '' 'yQNOMATCH'
+nul_match 0 '' 'QNOMATCH'
+nul_match 0 '-i' 'YQNOMATCH'
+nul_match 0 '-i' 'QNOMATCH'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '' 'yQNÓMATCH'
+nul_match 0 '' 'QNÓMATCH'
+nul_match 0 '-i' 'YQNÓMATCH'
+nul_match 0 '-i' 'QNÓMATCH'
 
 test_expect_success 'grep respects binary diff attribute' '
 	echo text >t &&
-- 
2.11.0


  parent reply	other threads:[~2017-05-11  9:19 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11  9:18 [PATCH 00/29] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 01/29] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 02/29] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 03/29] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 04/29] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-05-12  4:48   ` Junio C Hamano
2017-05-13 18:02     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 05/29] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 06/29] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 07/29] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
2017-05-11 18:31   ` Brandon Williams
2017-05-11 18:35     ` Ævar Arnfjörð Bjarmason
2017-05-11 20:05       ` Brandon Williams
2017-05-11  9:18 ` [PATCH 08/29] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
2017-05-11 18:36   ` Brandon Williams
2017-05-11 19:22     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 09/29] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
2017-05-12  4:59   ` Junio C Hamano
2017-05-13 17:33     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 10/29] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 11/29] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
2017-05-12  5:06   ` Junio C Hamano
2017-05-13 11:33     ` Ævar Arnfjörð Bjarmason
2017-05-15  1:29       ` Junio C Hamano
2017-05-11  9:18 ` [PATCH 12/29] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` Ævar Arnfjörð Bjarmason [this message]
2017-05-11  9:18 ` [PATCH 14/29] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 15/29] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 16/29] perf: add a performance comparison test of grep -G, -E and -P Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 17/29] perf: add a performance comparison of fixed-string grep Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 18/29] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
2017-05-11 20:08   ` Brandon Williams
2017-05-11 20:40     ` Stefan Beller
2017-05-11 20:50       ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 19/29] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 20/29] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 21/29] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
2017-05-11 20:15   ` Brandon Williams
2017-05-11 20:22     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 22/29] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-05-11 20:10   ` Brandon Williams
2017-05-11  9:18 ` [PATCH 23/29] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 24/29] grep: move two functions to avoid forward declaration Ævar Arnfjörð Bjarmason
2017-05-11 20:14   ` Brandon Williams
2017-05-11 20:20     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 25/29] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 26/29] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
2017-05-11 20:17   ` Brandon Williams
2017-05-11 20:34     ` Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 27/29] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
2017-05-11  9:18 ` [PATCH 28/29] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
2017-05-11 20:21   ` Brandon Williams
2017-05-11 20:33     ` Ævar Arnfjörð Bjarmason
2017-05-11 20:43       ` Brandon Williams
2017-05-11 21:20         ` [PATCH] C style: use standard style for "TRANSLATORS" comments Ævar Arnfjörð Bjarmason
2017-05-11 21:41           ` Jonathan Nieder
2017-05-11 21:50             ` Brandon Williams
2017-05-12  5:20           ` Johannes Sixt
2017-05-30 16:02           ` Jiang Xin
2017-05-30 23:03             ` Junio C Hamano
2017-05-11  9:18 ` [PATCH 29/29] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511091829.5634-14-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).