From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "René Scharfe" <l.s.r@web.de>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v5 0/2] tests(mingw): avoid super-slow mingw_test_cmp
Date: Tue, 06 Dec 2022 15:07:45 +0000 [thread overview]
Message-ID: <pull.1309.v5.git.1670339267.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1309.v4.git.1668434812.gitgitgadget@gmail.com>
Half a year ago, directly after sending a patch to fix a performance
regression due to a mis-use of test_cmp
[https://lore.kernel.org/git/b9203ea247776332e4b6f519aa27d541207adc2f.1659097724.git.gitgitgadget@gmail.com/],
I got curious to see whether Git for Windows had the same issue. And it did
not: it passed t5351 in 22 seconds, even while using test_cmp to compare
pack files
[https://github.com/git-for-windows/git/blob/3922f62f0d5991e9fe0a0817ebf89a91339c7705/t/t5351-unpack-large-objects.sh#L90].
This motivated me to upstream Git for Windows' changes.
Changes since v4:
* The commit message was rewritten almost completely.
Changes since v3:
* Fixed the subject of the cover letter (which should have been adjusted in
v3)
* Elaborated the paragraph about the historical context of this patch
Changes since v2:
* Dropped the test helper, using diff --no-index instead.
Changes since v1:
* Fixed double "with" in the commit message.
* Renamed the test helper to text-cmp.
* Made the diff --no-index call more robust by using a double-dash
separator.
Johannes Schindelin (2):
t0021: use Windows-friendly `pwd`
tests(mingw): avoid very slow `mingw_test_cmp`
t/t0021-conversion.sh | 4 +--
t/test-lib-functions.sh | 66 -----------------------------------------
t/test-lib.sh | 2 +-
3 files changed, 3 insertions(+), 69 deletions(-)
base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1309%2Fdscho%2Fmingw-test-cmp-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1309/dscho/mingw-test-cmp-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1309
Range-diff vs v4:
1: b38b8fb5a85 = 1: b38b8fb5a85 t0021: use Windows-friendly `pwd`
2: 128b1f348d8 ! 2: 6a80fab7e39 tests(mingw): avoid very slow `mingw_test_cmp`
@@ Metadata
## Commit message ##
tests(mingw): avoid very slow `mingw_test_cmp`
- It is more performant to run `git diff --no-index` than running the
- `mingw_test_cmp` code with MSYS2's Bash, i.e. the Bash that Git for
- Windows uses. And a lot more readable.
+ When Git's test suite uses `test_cmp`, it is not actually trying to
+ compare binary files as the name `cmp` would suggest to users familiar
+ with Unix' tools, but the tests instead verify that actual output
+ matches the expected text.
- The original reason why Git's test suite needs the `mingw_test_cmp`
- function at all (and why `cmp` is not good enough) is that Git's test
- suite is not actually trying to compare binary files when it calls
- `test_cmp`, but it compares text files. And those text files can contain
- CR/LF line endings depending on the circumstances.
+ On Unix, `cmp` works well enough for Git's purposes because only Line
+ Feed characters are used as line endings. However, on Windows, while
+ most tools accept Line Feeds as line endings, many tools produce
+ Carriage Return + Line Feed line endings, including some of the tools
+ used by the test suite (which are therefore provided via Git for Windows
+ SDK). Therefore, `cmp` would frequently fail merely due to different
+ line endings.
- Note: The original fix in the Git for Windows project implemented a test
- helper that avoids the overhead of the diff machinery, in favor of
- implementing a behavior that is more in line with what `mingw_test_cmp`
- does now. This was done to minimize the risk in using something as
- complex as the diff machinery to perform something as simple as
- determining whether text output is identical to the expected output or
- not. This approach has served Git for Windows well for years, but the
- attempt to upstream this saw a lot of backlash and distractions during
- the review, was disliked by the Git maintainer and was therefore
- abandoned. For full details, see the thread at
- https://lore.kernel.org/git/pull.1309.git.1659106382128.gitgitgadget@gmail.com/t
+ To accommodate for that, the `mingw_test_cmp` function was introduced
+ into Git's test suite to perform a line-by-line comparison that ignores
+ line endings. This function is a Bash function that is only used on
+ Windows, everywhere else `cmp` is used.
+
+ This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is
+ slow, even more so on Windows because it is a Bash script function, and
+ Bash scripts are known to run particularly slowly on Windows due to
+ Bash's need for the POSIX emulation layer provided by the MSYS2 runtime.
+
+ The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for
+ binary data, 2022-07-29) provides an illuminating account of the
+ consequences: On Windows, the platform on which Git could really use all
+ the help it can get to improve its performance, the time spent on one
+ entire test script was reduced from half an hour to less than half a
+ minute merely by avoiding a single call to `mingw_test_cmp` in but a
+ single test case.
+
+ Learning the lesson to avoid shell scripting wherever possible, the Git
+ for Windows project implemented a minimal replacement for
+ `mingw_test_cmp` in the form of a `test-tool` subcommand that parses the
+ input files line by line, ignoring line endings, and compares them.
+ Essentially the same thing as `mingw_test_cmp`, but implemented in
+ C instead of Bash. This solution served the Git for Windows project
+ well, over years.
+
+ However, when this solution was finally upstreamed, the conclusion was
+ reached that a change to use `git diff --no-index` instead of
+ `mingw_test_cmp` was more easily reviewed and hence should be used
+ instead.
+
+ The reason why this approach was not even considered in Git for Windows
+ is that in 2007, there was already a motion on the table to use Git's
+ own diff machinery to perform comparisons in Git's test suite, but it
+ was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/
+ as undesirable because tests might potentially succeed due to bugs in
+ the diff machinery when they should not succeed, and those bugs could
+ therefore hide regressions that the tests try to prevent.
+
+ By the time Git for Windows' `mingw-test-cmp` in C was finally
+ contributed to the Git mailing list, reviewers agreed that the diff
+ machinery had matured enough and should be used instead.
+
+ When the concern was raised that the diff machinery, due to its
+ complexity, would perform substantially worse than the test helper
+ originally implemented in the Git for Windows project, a test
+ demonstrated that these performance differences are well lost within the
+ 100+ minutes it takes to run Git's test suite on Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
--
gitgitgadget
next prev parent reply other threads:[~2022-12-06 15:11 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-29 14:53 [PATCH] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-07-29 14:54 ` Johannes Schindelin
2022-07-29 16:44 ` Junio C Hamano
2022-09-06 13:10 ` Johannes Schindelin
2022-09-07 12:09 ` René Scharfe
2022-09-07 16:25 ` Junio C Hamano
2022-09-07 21:45 ` Re* " Junio C Hamano
2022-09-07 22:39 ` René Scharfe
2022-09-08 0:03 ` Junio C Hamano
2022-09-08 8:59 ` René Scharfe
2022-09-08 15:26 ` Ævar Arnfjörð Bjarmason
2022-09-08 20:54 ` Johannes Schindelin
2022-09-08 21:09 ` Junio C Hamano
2022-09-06 13:10 ` [PATCH v2 0/2] " Johannes Schindelin via GitGitGadget
2022-09-06 13:10 ` [PATCH v2 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-09-06 13:10 ` [PATCH v2 2/2] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-09-07 11:57 ` Ævar Arnfjörð Bjarmason
2022-09-07 12:24 ` Ævar Arnfjörð Bjarmason
2022-09-07 19:45 ` Junio C Hamano
2022-09-07 9:04 ` [PATCH v2 0/2] " Johannes Schindelin
2022-11-12 22:07 ` [PATCH v3 " Johannes Schindelin via GitGitGadget
2022-11-12 22:07 ` [PATCH v3 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-12 22:07 ` [PATCH v3 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-13 4:51 ` Taylor Blau
2022-11-14 13:34 ` Johannes Schindelin
2022-11-18 23:15 ` Junio C Hamano
2022-11-19 2:53 ` Taylor Blau
2022-11-19 12:03 ` Ævar Arnfjörð Bjarmason
2022-11-19 8:18 ` Johannes Sixt
2022-11-19 17:50 ` René Scharfe
2022-11-20 9:29 ` Torsten Bögershausen
2022-11-21 17:49 ` Johannes Sixt
2022-11-21 3:13 ` Junio C Hamano
2022-11-14 9:53 ` Phillip Wood
2022-11-14 13:47 ` Johannes Schindelin
2022-11-14 11:55 ` Ævar Arnfjörð Bjarmason
2022-11-14 14:02 ` Johannes Schindelin
2022-11-14 15:23 ` Ævar Arnfjörð Bjarmason
2022-11-18 23:19 ` Junio C Hamano
2022-11-19 2:56 ` Taylor Blau
2022-11-19 11:54 ` Ævar Arnfjörð Bjarmason
2022-11-21 3:17 ` Junio C Hamano
2022-11-14 14:06 ` [PATCH v4 0/2] tests(mingw): avoid super-slow mingw_test_cmp Johannes Schindelin via GitGitGadget
2022-11-14 14:06 ` [PATCH v4 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-14 14:06 ` [PATCH v4 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-14 22:40 ` Taylor Blau
2022-11-18 13:32 ` Johannes Schindelin
2022-11-18 18:14 ` Taylor Blau
2022-11-20 23:36 ` Johannes Schindelin
2022-11-21 0:07 ` Taylor Blau
2022-12-06 15:07 ` Johannes Schindelin via GitGitGadget [this message]
2022-12-06 15:07 ` [PATCH v5 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-12-06 15:07 ` [PATCH v5 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-12-06 18:55 ` Ævar Arnfjörð Bjarmason
2022-12-06 21:52 ` Johannes Sixt
2022-12-06 21:54 ` René Scharfe
2022-12-07 4:33 ` Junio C Hamano
2022-12-07 1:31 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1309.v5.git.1670339267.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).