git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Todd Zullinger <tmz@pobox.com>
To: Andreas Schwab <schwab@linux-m68k.org>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Beat Bolli" <dev+git@drbeat.li>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>
Subject: [PATCH] t7812: expect failure for grep -i with invalid UTF-8 data
Date: Fri, 29 Nov 2019 19:46:53 -0500	[thread overview]
Message-ID: <20191130004653.8794-1-tmz@pobox.com> (raw)
In-Reply-To: <87blsyl32c.fsf_-_@igel.home>

When the 'grep with invalid UTF-8 data' tests were added/adjusted in
8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26)
and 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
2019-07-26) they lacked a redirect which caused them to falsely succeed
on most architectures.  They failed on big-endian arches where the test
never reached the portion which was missing the redirect.

A recent patch add the missing redirect and exposed the fact that the
'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i' test fails on
all architectures.

Based on the final paragraph in in 870eea8166:

    When grepping a non-ASCII fixed string. This is a more general problem
    that's hard to fix, but we can at least fix the most common case of
    grepping for a fixed string without "-i". I can't think of a reason
    for why we'd turn on PCRE2_UTF when matching byte-for-byte like that.

it seems that we don't expect that the case-insensitive grep will
succeed.  Adjust the test to reflect that expectation.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
---

Hi,

Andreas Schwab wrote:
> Two tests in t7812 were missing redirects, failing to actually test the
> produced output.
> 
> Fixes: 8a5999838e ("grep: stess test PCRE v2 on invalid UTF-8 data")
> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>

Nice catch on these missing redirects.  While testing the
2.24.0 rc's this test failed only on big-endian arches (like
s390x in Fedora).  It was briefly mentioned at the time in
<20191020002648.GZ10893@pobox.com>.

After applying the fix to add the missing redirects, the
test now fails on all architectures.  It seems like that is
the intended state and we simply need to adjust the final
test with `grep -i` accordingly.

Of course, it's possible that I've misunderstood Ævar's
intentions in 870eea8166 (grep: do not enter PCRE2_UTF mode
on fixed matching, 2019-07-26) or otherwise have poorly
explained the reasoning in my commit message.

We don't appear to test PCRE (neither v1 nor v2) in our CI
builds.  I added libpcre2-dev and set USE_LIBPCRE to test
this in travis.  (Whether we want to enable that by default
is worth discussing.)

Doing so confirms that the tests (incorrectly) pass without
the missing redirects and fail when the redirects are added.
Inverting the expectation of test_cmp in the final `grep -i`
test results in a successful test run.

with PCRE2:
https://travis-ci.org/tmzullinger/git/jobs/618733182

with PCRE2 and 'add missing redirects':
https://travis-ci.org/tmzullinger/git/jobs/618723277

with PCRE2, 'add missing redirects' and this patch:
https://travis-ci.org/tmzullinger/git/jobs/618796963

This still doesn't fix the failure on s390x.  There

    test_might_fail git grep -hi "Æ" invalid-0x80 >actual

fails and leaves nothing in 'actual' which causes the test
to fail at the following

    test_cmp expected actual

line.  If we suspect the test might fail, it seems like we
need to account for that and not expect that 'expect' will
match 'actual' as we currently do.

 t/t7812-grep-icase-non-ascii.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index c4528432e5..03dba6685a 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -76,9 +76,12 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-ASCII from invali
 
 test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i' '
 	test_might_fail git grep -hi "Æ" invalid-0x80 >actual &&
-	test_cmp expected actual &&
+	if test -s actual
+	then
+	    test_cmp expected actual
+	fi &&
 	test_must_fail git grep -hi "(*NO_JIT)Æ" invalid-0x80 >actual &&
-	test_cmp expected actual
+	! test_cmp expected actual
 '
 
 test_done
-- 
2.24.0


  parent reply	other threads:[~2019-11-30  0:47 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-21 19:40 [PATCH] grep: use custom JIT stack with pcre2 Carlo Marcelo Arenas Belón
2019-07-24 15:14 ` [PATCH 0/3] grep: PCRE JIT fixes Ævar Arnfjörð Bjarmason
2019-07-24 16:18   ` Junio C Hamano
2019-07-24 20:03     ` Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 0/8] grep: PCRE JIT fixes + ab/no-kwset fix Ævar Arnfjörð Bjarmason
2019-07-26 20:27     ` Junio C Hamano
2019-07-29  9:20       ` Ævar Arnfjörð Bjarmason
2019-07-29 16:12         ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 1/8] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 2/8] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-29  0:33     ` Carlo Arenas
2019-07-26 15:08   ` [PATCH v2 3/8] grep: stop using a custom JIT stack with PCRE v1 Ævar Arnfjörð Bjarmason
2019-07-29  1:26     ` Carlo Arenas
2019-07-26 15:08   ` [PATCH v2 4/8] grep: consistently use "p->fixed" in compile_regexp() Ævar Arnfjörð Bjarmason
2019-07-29  1:48     ` Carlo Arenas
2019-07-29  9:05       ` Ævar Arnfjörð Bjarmason
2019-07-29  9:13         ` Ævar Arnfjörð Bjarmason
2019-07-29 16:23           ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 5/8] grep: create a "is_fixed" member in "grep_pat" Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 6/8] grep: stess test PCRE v2 on invalid UTF-8 data Ævar Arnfjörð Bjarmason
2019-07-26 20:34     ` Junio C Hamano
2019-07-26 21:55       ` Ævar Arnfjörð Bjarmason
2019-07-29  3:06     ` Carlo Arenas
2019-11-26 21:50     ` [PATCH] t7812: add missing redirects Andreas Schwab
2019-11-26 22:27       ` Johannes Schindelin
2019-11-26 23:11         ` Andreas Schwab
2019-11-27 11:58           ` Jeff King
2019-11-30  0:46       ` Todd Zullinger [this message]
2019-11-30  8:00         ` [PATCH] t7812: expect failure for grep -i with invalid UTF-8 data Andreas Schwab
2019-12-01 16:33           ` Junio C Hamano
2019-12-01 17:09             ` Andreas Schwab
2019-12-01 18:32             ` Todd Zullinger
2019-12-02  6:13               ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 7/8] grep: do not enter PCRE2_UTF mode on fixed matching Ævar Arnfjörð Bjarmason
2019-07-26 20:36     ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 8/8] grep: optimistically use PCRE2_MATCH_INVALID_UTF Ævar Arnfjörð Bjarmason
2019-07-26 21:07     ` Junio C Hamano
2019-07-26 21:53       ` Ævar Arnfjörð Bjarmason
2019-07-26 21:57         ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 0/4] grep: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 0/2] " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 13:53         ` Ramsay Jones
2021-01-24 14:24           ` Ramsay Jones
2021-01-24 14:49             ` Ævar Arnfjörð Bjarmason
2021-01-24 16:10               ` Ramsay Jones
2021-01-24 17:29                 ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 1/4] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 2/4] grep/pcre2: simplify boolean spaghetti Ævar Arnfjörð Bjarmason
2021-01-24  5:33       ` Junio C Hamano
2021-01-24 10:45         ` Johannes Sixt
2021-01-24  2:12     ` [PATCH v3 3/4] grep/pcre2: further " Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 4/4] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 1/3] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 2/3] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-24 16:24   ` Junio C Hamano
2019-07-24 20:06     ` Ævar Arnfjörð Bjarmason
2019-07-25  5:11       ` Carlo Arenas
2019-07-24 15:14 ` [PATCH 3/3] grep: stop using a custom JIT stack with PCRE v1 Ævar Arnfjörð Bjarmason
2019-07-26 13:15   ` Carlo Arenas
2019-07-26 13:50     ` Ævar Arnfjörð Bjarmason
2019-07-26 14:12       ` Carlo Arenas
2019-07-26 14:43         ` Ævar Arnfjörð Bjarmason
2019-07-26 20:26           ` [RFC PATCH 0/2] PCRE1 cleanup Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 1/2] grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1 Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 2/2] grep: refactor and simplify PCRE1 support Carlo Marcelo Arenas Belón

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191130004653.8794-1-tmz@pobox.com \
    --to=tmz@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=dev+git@drbeat.li \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).