git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH v5 08/10] wildmatch test: create & test files on disk in addition to in-memory
    @ 2018-01-30 21:21  6%   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 2+ results
From: Ævar Arnfjörð Bjarmason @ 2018-01-30 21:21 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Anthony Ramine, Johannes Sixt, Adam Dinwoodie,
	Johannes Schindelin, Kyle J . McKay,
	Ævar Arnfjörð Bjarmason

There has never been any full roundtrip testing of what git-ls-files
and other commands that use wildmatch() actually do, rather we've been
satisfied with just testing the underlying C function.

Due to git-ls-files and friends having their own codepaths before they
call wildmatch() there's sometimes differences in the behavior between
the two. Even when we test for those (as with [1]), there was no one
place where you can review how these two modes differ.

Now there is. We now attempt to create a file called $haystack and
match $needle against it for each pair of $needle and $haystack that
we were passing to test-wildmatch.

If we can't create the file we skip the test. This ensures that we can
run this on all platforms and not maintain some infinitely growing
whitelist of e.g. platforms that don't support certain characters in
filenames.

A notable exception to this is Windows, where due to the reasons
explained in [2] the shellscript emulation layer might fake the
creation of a file such as "*", and "test -e" for it will succeed
since it just got created with some character that maps to "*", but
git ls-files won't be fooled by this.

Thus we need to skip creating certain filenames entirely on Windows,
the list here might be overly aggressive. I don't have access to a
Windows system to test this.

As a result of doing these tests we can now see the cases where these
two ways of testing wildmatch differ:

 * Creating a file called 'a[]b' and running ls-files 'a[]b' will show
   that file, but wildmatch("a[]b", "a[]b") will not match

 * wildmatch() won't match a file called \ against \, but ls-files
   will.

 * `git --glob-pathspecs ls-files 'foo**'` will match a file
   'foo/bba/arr', but wildmatch won't, however pathmatch will.

   This seems like a bug to me, the two are otherwise equivalent as
   these tests show.

This also reveals the case discussed in [1], since 2.16.0 '' is now an
error as far as ls-files is concerned, but wildmatch() itself happily
accepts it.

1. 9e4e8a64c2 ("pathspec: die on empty strings as pathspec",
   2017-06-06)

2. nycvar.QRO.7.76.6.1801052133380.1337@wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet
   (https://public-inbox.org/git/?q=nycvar.QRO.7.76.6.1801052133380.1337%40wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t3070-wildmatch.sh | 201 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 190 insertions(+), 11 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 3e75cb0cbe..bd11e5acb0 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,6 +4,72 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
+should_create_test_file() {
+	file=$1
+
+	case $file in
+	# `touch .` will succeed but obviously not do what we intend
+	# here.
+	".")
+		return 1
+		;;
+	# We cannot create a file with an empty filename.
+	"")
+		return 1
+		;;
+	# The tests that are testing that e.g. foo//bar is matched by
+	# foo/*/bar can't be tested on filesystems since there's no
+	# way we're getting a double slash.
+	*//*)
+		return 1
+		;;
+	# When testing the difference between foo/bar and foo/bar/ we
+	# can't test the latter.
+	*/)
+		return 1
+		;;
+	# On Windows, \ in paths is silently converted to /, which
+	# would result in the "touch" below working, but the test
+	# itself failing. See 6fd1106aa4 ("t3700: Skip a test with
+	# backslashes in pathspec", 2009-03-13) for prior art and
+	# details.
+	*\\*)
+		if ! test_have_prereq BSLASHPSPEC
+		then
+			return 1
+		fi
+		# NOTE: The ;;& bash extension is not portable, so
+		# this test needs to be at the end of the pattern
+		# list.
+		#
+		# If we want to add more conditional returns we either
+		# need a new case statement, or turn this whole thing
+		# into a series of "if" tests.
+		;;
+	esac
+
+
+	# On Windows proper (i.e. not Cygwin) many file names which
+	# under Cygwin would be emulated don't work.
+	if test_have_prereq MINGW
+	then
+		case $file in
+		" ")
+			# Files called " " are forbidden on Windows
+			return 1
+			;;
+		*\<*|*\>*|*:*|*\"*|*\|*|*\?*|*\**)
+			# Files with various special characters aren't
+			# allowed on Windows. Sourced from
+			# https://stackoverflow.com/a/31976060
+			return 1
+			;;
+		esac
+	fi
+
+	return 0
+}
+
 match_with_function() {
 	text=$1
 	pattern=$2
@@ -26,25 +92,133 @@ match_with_function() {
 
 }
 
+match_with_ls_files() {
+	text=$1
+	pattern=$2
+	match_expect=$3
+	match_function=$4
+	ls_files_args=$5
+
+	match_stdout_stderr_cmp="
+		tr -d '\0' <actual.raw >actual &&
+		>expect.err &&
+		test_cmp expect.err actual.err &&
+		test_cmp expect actual"
+
+	if test "$match_expect" = 'E'
+	then
+		if test -e .git/created_test_file
+		then
+			test_expect_success "$match_function (via ls-files): match dies on '$pattern' '$text'" "
+				printf '%s' '$text' >expect &&
+				test_must_fail git$ls_files_args ls-files -z -- '$pattern'
+			"
+		else
+			test_expect_failure "$match_function (via ls-files): match skip '$pattern' '$text'" 'false'
+		fi
+	elif test "$match_expect" = 1
+	then
+		if test -e .git/created_test_file
+		then
+			test_expect_success "$match_function (via ls-files): match '$pattern' '$text'" "
+				printf '%s' '$text' >expect &&
+				git$ls_files_args ls-files -z -- '$pattern' >actual.raw 2>actual.err &&
+				$match_stdout_stderr_cmp
+			"
+		else
+			test_expect_failure "$match_function (via ls-files): match skip '$pattern' '$text'" 'false'
+		fi
+	elif test "$match_expect" = 0
+	then
+		if test -e .git/created_test_file
+		then
+			test_expect_success "$match_function (via ls-files): no match '$pattern' '$text'" "
+				>expect &&
+				git$ls_files_args ls-files -z -- '$pattern' >actual.raw 2>actual.err &&
+				$match_stdout_stderr_cmp
+			"
+		else
+			test_expect_failure "$match_function (via ls-files): no match skip '$pattern' '$text'" 'false'
+		fi
+	else
+		test_expect_success "PANIC: Test framework error. Unknown matches value $match_expect" 'false'
+	fi
+}
+
 match() {
-	match_glob=$1
-	match_iglob=$2
-	match_pathmatch=$3
-	match_pathmatchi=$4
-	text=$5
-	pattern=$6
+	if test "$#" = 6
+	then
+		# When test-wildmatch and git ls-files produce the same
+		# result.
+		match_glob=$1
+		match_file_glob=$match_glob
+		match_iglob=$2
+		match_file_iglob=$match_iglob
+		match_pathmatch=$3
+		match_file_pathmatch=$match_pathmatch
+		match_pathmatchi=$4
+		match_file_pathmatchi=$match_pathmatchi
+		text=$5
+		pattern=$6
+	elif test "$#" = 10
+	then
+		match_glob=$1
+		match_iglob=$2
+		match_pathmatch=$3
+		match_pathmatchi=$4
+		match_file_glob=$5
+		match_file_iglob=$6
+		match_file_pathmatch=$7
+		match_file_pathmatchi=$8
+		text=$9
+		pattern=${10}
+	fi
+
+	test_expect_success 'cleanup after previous file test' '
+		if test -e .git/created_test_file
+		then
+			git reset &&
+			git clean -df
+		fi
+	'
+
+	printf '%s' "$text" >.git/expected_test_file
+
+	test_expect_success "setup match file test for $text" '
+		file=$(cat .git/expected_test_file) &&
+		if should_create_test_file "$file"
+		then
+			dirs=${file%/*}
+			if test "$file" != "$dirs"
+			then
+				mkdir -p -- "$dirs" &&
+				touch -- "./$text"
+			else
+				touch -- "./$file"
+			fi &&
+			git add -A &&
+			printf "%s" "$file" >.git/created_test_file
+		elif test -e .git/created_test_file
+		then
+			rm .git/created_test_file
+		fi
+	'
 
 	# $1: Case sensitive glob match: test-wildmatch & ls-files
 	match_with_function "$text" "$pattern" $match_glob "wildmatch"
+	match_with_ls_files "$text" "$pattern" $match_file_glob "wildmatch" " --glob-pathspecs"
 
 	# $2: Case insensitive glob match: test-wildmatch & ls-files
 	match_with_function "$text" "$pattern" $match_iglob "iwildmatch"
+	match_with_ls_files "$text" "$pattern" $match_file_iglob "iwildmatch" " --glob-pathspecs --icase-pathspecs"
 
 	# $3: Case sensitive path match: test-wildmatch & ls-files
 	match_with_function "$text" "$pattern" $match_pathmatch "pathmatch"
+	match_with_ls_files "$text" "$pattern" $match_file_pathmatch "pathmatch" ""
 
 	# $4: Case insensitive path match: test-wildmatch & ls-files
 	match_with_function "$text" "$pattern" $match_pathmatchi "ipathmatch"
+	match_with_ls_files "$text" "$pattern" $match_file_pathmatchi "ipathmatch" " --icase-pathspecs"
 }
 
 # Basic wildmatch features
@@ -113,7 +287,8 @@ match 1 1 1 1 'acrt' 'a[c-c]rt'
 match 0 0 0 0 ']' '[!]-]'
 match 1 1 1 1 'a' '[!]-]'
 match 0 0 0 0 '' '\'
-match 0 0 0 0 '\' '\'
+match 0 0 0 0 \
+      1 1 1 1 '\' '\'
 match 0 0 0 0 'XXX/\' '*/\'
 match 1 1 1 1 'XXX/\' '*/\\'
 match 1 1 1 1 'foo' 'foo'
@@ -127,7 +302,8 @@ match 1 1 1 1 '[ab]' '[[:digit]ab]'
 match 1 1 1 1 '[ab]' '[\[:]ab]'
 match 1 1 1 1 '?a?b' '\??\?b'
 match 1 1 1 1 'abc' '\a\b\c'
-match 0 0 0 0 'foo' ''
+match 0 0 0 0 \
+      E E E E 'foo' ''
 match 1 1 1 1 'foo/bar/baz/to' '**/t[o]'
 
 # Character class tests
@@ -157,8 +333,10 @@ match 1 1 1 1 ']' '[\]]'
 match 0 0 0 0 '\]' '[\]]'
 match 0 0 0 0 '\' '[\]]'
 match 0 0 0 0 'ab' 'a[]b'
-match 0 0 0 0 'a[]b' 'a[]b'
-match 0 0 0 0 'ab[' 'ab['
+match 0 0 0 0 \
+      1 1 1 1 'a[]b' 'a[]b'
+match 0 0 0 0 \
+      1 1 1 1 'ab[' 'ab['
 match 0 0 0 0 'ab' '[!'
 match 0 0 0 0 'ab' '[-'
 match 1 1 1 1 '-' '[-]'
@@ -226,7 +404,8 @@ match 1 1 1 1 foo/bar 'foo/*'
 match 0 0 1 1 foo/bba/arr 'foo/*'
 match 1 1 1 1 foo/bba/arr 'foo/**'
 match 0 0 1 1 foo/bba/arr 'foo*'
-match 0 0 1 1 foo/bba/arr 'foo**'
+match 0 0 1 1 \
+      1 1 1 1 foo/bba/arr 'foo**'
 match 0 0 1 1 foo/bba/arr 'foo/*arr'
 match 0 0 1 1 foo/bba/arr 'foo/**arr'
 match 0 0 0 0 foo/bba/arr 'foo/*z'
-- 
2.15.1.424.g9478a66081


^ permalink raw reply related	[relevance 6%]

* Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory
  @ 2018-01-05 20:48  7%         ` Johannes Schindelin
  0 siblings, 0 replies; 2+ results
From: Johannes Schindelin @ 2018-01-05 20:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Anthony Ramine, Johannes Sixt, Adam Dinwoodie

[-- Attachment #1: Type: text/plain, Size: 6229 bytes --]

Hi Ævar,

On Fri, 5 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Jan 05 2018, Johannes Schindelin jotted:
> 
> > [...]
> >
> > In short: the Unix shell script t3070 manages to write what it thinks is a
> > file called 'foo*', but Git only sees 'foo<some-undisplayable-character>'.
> >
> > I tried to address this problem with this patch:
> 
> ...I don't see any particular value in trying to do these full roundtrip
> tests on platforms like Windows. Perhaps we should just do these on a
> whitelist of POSIX systems for now, and leave expanding that list to
> some future step.

I don't think so. Windows is already handled as a second-class citizen, as
if nobody developed on it. As a consequence, only very few of the
gazillions of Windows developers... develop Git. We could worsify the
situation, of course, but why? Shouldn't we at least pretend to try the
opposite?

> > -- snip --
> > diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> > index f606f91acbb..51dcb675e7b 100755
> > --- a/t/t3070-wildmatch.sh
> > +++ b/t/t3070-wildmatch.sh
> > @@ -4,6 +4,14 @@ test_description='wildmatch tests'
> >
> >  . ./test-lib.sh
> >
> > +if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null
> > +then
> > +	test_set_prereq FILENAMESWITHSTARS
> > +else
> > +	say 'Your filesystem does not allow stars in filenames.'
> > +fi
> > +rm -f 'a*b'
> > +
> >  create_test_file() {
> >  	file=$1
> >
> > @@ -28,6 +36,17 @@ create_test_file() {
> >  	*/)
> >  		return 1
> >  		;;
> > +	# On Windows, stars are not allowed in filenames. Git for Windows'
> > +	# Bash, however, is based on Cygwin which plays funny names with a
> > +	# private Unicode page to emulate stars in filenames. Meaning that
> > +	# the shell script will "succeed" to write the file, but Git will
> > +	# not see it. Nor any other, regular Windows process.
> > +	*\**|*\[*)
> > +		if ! test_have_prereq FILENAMESWITHSTARS
> > +		then
> > +			return 1
> > +		fi
> > +		;;
> >  	# On Windows, \ in paths is silently converted to /, which
> >  	# would result in the "touch" below working, but the test
> >  	# itself failing. See 6fd1106aa4 ("t3700: Skip a test with
> > -- snap --
> >
> > This gets us further. But not by much:
> 
> Okey, that's very weird. So you can:
> 
>     touch "./*"; echo $?
> 
> And it'll return 0 but then the file won't exist?

Almost. The file exists, but it won't have the name '*'. It will have as
name a Unicode character that is in a private page, not standardized by
the Unicode specification.

> How about this:
> 
>     touch "./*" && test -e "./*"; echo $?

Would return 0. Why? Because *you are still in a Unix shell script, so the
Cygwin cuteness still applies*.

> The reason this latest version stopped creating files with "\" in them
> unless under BSLASHPSPEC is because Cygwin would silently translate it,
> so it would create the file but it would actually mean something the
> tests didn't expect.

I understand that. And I would wish that the test would be designed in a
more cross-platform-aware mindset.

> For anything else, such as stars not being allowed in filenames I was
> expecting that "touch" command to return an error, but if that's not the
> case maybe we need the "test -e" as well, unless I'm missing something
> here.

This is one of the many bad consequences of Git relying so much on Unix
shell scripting. Despite what many, many, many Git developers think: shell
scripting is not portable.

Cygwin does a good job of pretending that it is, and MSYS2 exacerbates
that notion, but it comes back to haunt you right here and right now. The
`touch` invocation will *report success*, but it will have done something
different than you wanted. It's like the Thinking: Fast and Slow.

> > fatal: Invalid path '\[ab]': No such file or directory
> >
> > You see, any path starting with a backslash *is* an absolute path on
> > Windows. It is relative to the current drive.
> 
> Right, which I was trying to avoid by not actually creating \[ab], but
> "./\[ab]", is that the same filename on Windows?

Whoops. I managed to copy-paste the *wrong* command's error message. Sorry
about that. The correct one:

$ git --glob-pathspecs ls-files -z -- '\[ab]'
fatal: \[ab]: '\[ab]' is outside repository

Note how it is Git reporting that you asked for a path that is outside?
That's because it thinks you are referring to C:\[ab] (if your current
directory is on the C: drive).

And it would be correct to complain on Windows: the `\[ab]` parameter
refers to an absolute path.

> > This affects *quite* a few of the test cases you added.
> >
> > And even if I just comment all of those out, I run into the next problem
> > where you try to create a file whose name consists of a single space,
> > which is also illegal on Windows.
> 
> Okey, but ditto above about touch not catching it, does:
> 
>     touch "./ "; echo $?
> 
> Not return an error? Then how about:
> 
>     touch "./ " && test -e "./ "; echo $?

Again: as long as you stay within the bounds of the Unix shell script (did
I point out enough yet how non-portable this design is? Even Subversion
knew better than to implement parts of its operations as Unix shell
scripts. I mean, for PoCing, okay, but for production code?) you fall prey
to Cygwin's emulation of POSIX-y filenames. As soon as you leave that
bubble (e.g. by calling git.exe), you're not going to see those illegal
file names, but the ones with the unprintable Unicode characters.

> > These woes demonstrate one problem with the approach of overzealously
> > testing *everything*[...]
> 
> I think the rest of this gets into topics I've covered above. I.e. that
> this increased test coverage has caught bugs.

That's all good and dandy, but what about regressions? I know how much I
will curse in your vague direction when I encounter the next
wildmatch-related bug in, say, half a year and have to wade through the
jungle of unintuitive tests in t3070.

Can't we do a lot better than this? Shouldn't it be a lot more obvious
what the heck went wrong when running t3070 with -i -v -x?

Ciao,
Johannes

^ permalink raw reply	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-01-30 21:21     [PATCH v5 00/10] increase wildmatch test coverage Ævar Arnfjörð Bjarmason
2018-01-04 19:26     ` [PATCH v4 0/7] " Ævar Arnfjörð Bjarmason
2018-01-04 19:26       ` [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory Ævar Arnfjörð Bjarmason
2018-01-05 16:41         ` Johannes Schindelin
2018-01-05 19:08           ` Ævar Arnfjörð Bjarmason
2018-01-05 20:48  7%         ` Johannes Schindelin
2018-01-30 21:21  6%   ` [PATCH v5 08/10] " Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).