git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "René Scharfe" <l.s.r@web.de>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v5 2/2] tests(mingw): avoid very slow `mingw_test_cmp`
Date: Tue, 06 Dec 2022 15:07:47 +0000	[thread overview]
Message-ID: <6a80fab7e3936ec56e1583d6136d47487327e907.1670339267.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1309.v5.git.1670339267.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When Git's test suite uses `test_cmp`, it is not actually trying to
compare binary files as the name `cmp` would suggest to users familiar
with Unix' tools, but the tests instead verify that actual output
matches the expected text.

On Unix, `cmp` works well enough for Git's purposes because only Line
Feed characters are used as line endings. However, on Windows, while
most tools accept Line Feeds as line endings, many tools produce
Carriage Return + Line Feed line endings, including some of the tools
used by the test suite (which are therefore provided via Git for Windows
SDK). Therefore, `cmp` would frequently fail merely due to different
line endings.

To accommodate for that, the `mingw_test_cmp` function was introduced
into Git's test suite to perform a line-by-line comparison that ignores
line endings. This function is a Bash function that is only used on
Windows, everywhere else `cmp` is used.

This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is
slow, even more so on Windows because it is a Bash script function, and
Bash scripts are known to run particularly slowly on Windows due to
Bash's need for the POSIX emulation layer provided by the MSYS2 runtime.

The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for
binary data, 2022-07-29) provides an illuminating account of the
consequences: On Windows, the platform on which Git could really use all
the help it can get to improve its performance, the time spent on one
entire test script was reduced from half an hour to less than half a
minute merely by avoiding a single call to `mingw_test_cmp` in but a
single test case.

Learning the lesson to avoid shell scripting wherever possible, the Git
for Windows project implemented a minimal replacement for
`mingw_test_cmp` in the form of a `test-tool` subcommand that parses the
input files line by line, ignoring line endings, and compares them.
Essentially the same thing as `mingw_test_cmp`, but implemented in
C instead of Bash. This solution served the Git for Windows project
well, over years.

However, when this solution was finally upstreamed, the conclusion was
reached that a change to use `git diff --no-index` instead of
`mingw_test_cmp` was more easily reviewed and hence should be used
instead.

The reason why this approach was not even considered in Git for Windows
is that in 2007, there was already a motion on the table to use Git's
own diff machinery to perform comparisons in Git's test suite, but it
was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/
as undesirable because tests might potentially succeed due to bugs in
the diff machinery when they should not succeed, and those bugs could
therefore hide regressions that the tests try to prevent.

By the time Git for Windows' `mingw-test-cmp` in C was finally
contributed to the Git mailing list, reviewers agreed that the diff
machinery had matured enough and should be used instead.

When the concern was raised that the diff machinery, due to its
complexity, would perform substantially worse than the test helper
originally implemented in the Git for Windows project, a test
demonstrated that these performance differences are well lost within the
100+ minutes it takes to run Git's test suite on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/test-lib-functions.sh | 66 -----------------------------------------
 t/test-lib.sh           |  2 +-
 2 files changed, 1 insertion(+), 67 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8c44856eaec..452fe9bc8aa 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1541,72 +1541,6 @@ test_skip_or_die () {
 	error "$2"
 }
 
-# The following mingw_* functions obey POSIX shell syntax, but are actually
-# bash scripts, and are meant to be used only with bash on Windows.
-
-# A test_cmp function that treats LF and CRLF equal and avoids to fork
-# diff when possible.
-mingw_test_cmp () {
-	# Read text into shell variables and compare them. If the results
-	# are different, use regular diff to report the difference.
-	local test_cmp_a= test_cmp_b=
-
-	# When text came from stdin (one argument is '-') we must feed it
-	# to diff.
-	local stdin_for_diff=
-
-	# Since it is difficult to detect the difference between an
-	# empty input file and a failure to read the files, we go straight
-	# to diff if one of the inputs is empty.
-	if test -s "$1" && test -s "$2"
-	then
-		# regular case: both files non-empty
-		mingw_read_file_strip_cr_ test_cmp_a <"$1"
-		mingw_read_file_strip_cr_ test_cmp_b <"$2"
-	elif test -s "$1" && test "$2" = -
-	then
-		# read 2nd file from stdin
-		mingw_read_file_strip_cr_ test_cmp_a <"$1"
-		mingw_read_file_strip_cr_ test_cmp_b
-		stdin_for_diff='<<<"$test_cmp_b"'
-	elif test "$1" = - && test -s "$2"
-	then
-		# read 1st file from stdin
-		mingw_read_file_strip_cr_ test_cmp_a
-		mingw_read_file_strip_cr_ test_cmp_b <"$2"
-		stdin_for_diff='<<<"$test_cmp_a"'
-	fi
-	test -n "$test_cmp_a" &&
-	test -n "$test_cmp_b" &&
-	test "$test_cmp_a" = "$test_cmp_b" ||
-	eval "diff -u \"\$@\" $stdin_for_diff"
-}
-
-# $1 is the name of the shell variable to fill in
-mingw_read_file_strip_cr_ () {
-	# Read line-wise using LF as the line separator
-	# and use IFS to strip CR.
-	local line
-	while :
-	do
-		if IFS=$'\r' read -r -d $'\n' line
-		then
-			# good
-			line=$line$'\n'
-		else
-			# we get here at EOF, but also if the last line
-			# was not terminated by LF; in the latter case,
-			# some text was read
-			if test -z "$line"
-			then
-				# EOF, really
-				break
-			fi
-		fi
-		eval "$1=\$$1\$line"
-	done
-}
-
 # Like "env FOO=BAR some-program", but run inside a subshell, which means
 # it also works for shell functions (though those functions cannot impact
 # the environment outside of the test_env invocation).
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 7726d1da88a..f8c6205e08f 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1546,7 +1546,7 @@ case $uname_s in
 	test_set_prereq SED_STRIPS_CR
 	test_set_prereq GREP_STRIPS_CR
 	test_set_prereq WINDOWS
-	GIT_TEST_CMP=mingw_test_cmp
+	GIT_TEST_CMP="GIT_DIR=/dev/null git diff --no-index --ignore-cr-at-eol --"
 	;;
 *CYGWIN*)
 	test_set_prereq POSIXPERM
-- 
gitgitgadget

  parent reply	other threads:[~2022-12-06 15:11 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29 14:53 [PATCH] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-07-29 14:54 ` Johannes Schindelin
2022-07-29 16:44 ` Junio C Hamano
2022-09-06 13:10   ` Johannes Schindelin
2022-09-07 12:09     ` René Scharfe
2022-09-07 16:25       ` Junio C Hamano
2022-09-07 21:45         ` Re* " Junio C Hamano
2022-09-07 22:39           ` René Scharfe
2022-09-08  0:03             ` Junio C Hamano
2022-09-08  8:59         ` René Scharfe
2022-09-08 15:26           ` Ævar Arnfjörð Bjarmason
2022-09-08 20:54         ` Johannes Schindelin
2022-09-08 21:09           ` Junio C Hamano
2022-09-06 13:10 ` [PATCH v2 0/2] " Johannes Schindelin via GitGitGadget
2022-09-06 13:10   ` [PATCH v2 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-09-06 13:10   ` [PATCH v2 2/2] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-09-07 11:57     ` Ævar Arnfjörð Bjarmason
2022-09-07 12:24       ` Ævar Arnfjörð Bjarmason
2022-09-07 19:45         ` Junio C Hamano
2022-09-07  9:04   ` [PATCH v2 0/2] " Johannes Schindelin
2022-11-12 22:07   ` [PATCH v3 " Johannes Schindelin via GitGitGadget
2022-11-12 22:07     ` [PATCH v3 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-12 22:07     ` [PATCH v3 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-13  4:51       ` Taylor Blau
2022-11-14 13:34         ` Johannes Schindelin
2022-11-18 23:15         ` Junio C Hamano
2022-11-19  2:53           ` Taylor Blau
2022-11-19 12:03             ` Ævar Arnfjörð Bjarmason
2022-11-19  8:18           ` Johannes Sixt
2022-11-19 17:50             ` René Scharfe
2022-11-20  9:29               ` Torsten Bögershausen
2022-11-21 17:49               ` Johannes Sixt
2022-11-21  3:13             ` Junio C Hamano
2022-11-14  9:53       ` Phillip Wood
2022-11-14 13:47         ` Johannes Schindelin
2022-11-14 11:55       ` Ævar Arnfjörð Bjarmason
2022-11-14 14:02         ` Johannes Schindelin
2022-11-14 15:23           ` Ævar Arnfjörð Bjarmason
2022-11-18 23:19             ` Junio C Hamano
2022-11-19  2:56               ` Taylor Blau
2022-11-19 11:54                 ` Ævar Arnfjörð Bjarmason
2022-11-21  3:17                   ` Junio C Hamano
2022-11-14 14:06     ` [PATCH v4 0/2] tests(mingw): avoid super-slow mingw_test_cmp Johannes Schindelin via GitGitGadget
2022-11-14 14:06       ` [PATCH v4 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-14 14:06       ` [PATCH v4 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-14 22:40         ` Taylor Blau
2022-11-18 13:32           ` Johannes Schindelin
2022-11-18 18:14             ` Taylor Blau
2022-11-20 23:36               ` Johannes Schindelin
2022-11-21  0:07                 ` Taylor Blau
2022-12-06 15:07       ` [PATCH v5 0/2] tests(mingw): avoid super-slow mingw_test_cmp Johannes Schindelin via GitGitGadget
2022-12-06 15:07         ` [PATCH v5 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-12-06 15:07         ` Johannes Schindelin via GitGitGadget [this message]
2022-12-06 18:55           ` [PATCH v5 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Ævar Arnfjörð Bjarmason
2022-12-06 21:52           ` Johannes Sixt
2022-12-06 21:54           ` René Scharfe
2022-12-07  4:33             ` Junio C Hamano
2022-12-07  1:31           ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a80fab7e3936ec56e1583d6136d47487327e907.1670339267.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).