git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] t3920: don't ignore errors of more than one command with `|| true`
@ 2022-11-21 17:58 Johannes Sixt
  2022-11-21 22:56 ` René Scharfe
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Johannes Sixt @ 2022-11-21 17:58 UTC (permalink / raw)
  To: Philippe Blain; +Cc: René Scharfe, Git Mailing List

It is customary to write `A || true` to ignore a potential error exit of
command A. But when we have a sequence `A && B && C || true && D`, then
a failure of any of A, B, or C skips to D right away. This is not
intended here. Turn the command whose failure is to be ignored into a
compound command to ensure it is the only one that is allowed to fail.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t3920-crlf-messages.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 4c661d4d54..a58522c163 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -12,7 +12,7 @@ create_crlf_ref () {
 	cat >.crlf-orig-$branch.txt &&
 	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
 	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
-	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
+	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
 	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
 	test_tick &&
 	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
-- 
2.36.0.137.geb37740430

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
@ 2022-11-21 22:56 ` René Scharfe
  2022-11-22  0:53   ` Junio C Hamano
  2022-11-22 18:28 ` Philippe Blain
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: René Scharfe @ 2022-11-21 22:56 UTC (permalink / raw)
  To: Johannes Sixt, Philippe Blain; +Cc: Git Mailing List

Am 21.11.22 um 18:58 schrieb Johannes Sixt:
> It is customary to write `A || true` to ignore a potential error exit of
> command A. But when we have a sequence `A && B && C || true && D`, then
> a failure of any of A, B, or C skips to D right away. This is not
> intended here. Turn the command whose failure is to be ignored into a
> compound command to ensure it is the only one that is allowed to fail.

Good catch!

> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  t/t3920-crlf-messages.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index 4c661d4d54..a58522c163 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -12,7 +12,7 @@ create_crlf_ref () {
>  	cat >.crlf-orig-$branch.txt &&
>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&

Useless use of cat.

>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&

My knee-jerk reaction to long lines like this is to pull out awk:

	awk '/Subject/ {printf "%s", sep $0; sep = " "}' .crlf-orig-$branch.txt >.crlf-subject-$branch.txt &&

This is not a faithful conversion because the original trims all
spaces from the end of the subject for some reason.  That would be:

	awk '/Subject/ {s = s $0 " "} END {sub(/ *$/, "", s); printf "%s", s}' .crlf-orig-$branch.txt >.crlf-subject-$branch.txt &&

> -	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
> +	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&

OK, back on topic: Adding CRs explicitly instead of relying on grep to
pass them through (which it doesn't on MinGW) also ignores the return
value of grep as a side effect of the pipe:

	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&

>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>  	test_tick &&
>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-21 22:56 ` René Scharfe
@ 2022-11-22  0:53   ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-11-22  0:53 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Philippe Blain, Git Mailing List

René Scharfe <l.s.r@web.de> writes:

>> @@ -12,7 +12,7 @@ create_crlf_ref () {
>>  	cat >.crlf-orig-$branch.txt &&
>>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>
> Useless use of cat.

Very good eyes.

> OK, back on topic: Adding CRs explicitly instead of relying on grep to
> pass them through (which it doesn't on MinGW) also ignores the return
> value of grep as a side effect of the pipe:
>
> 	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&

Very nice.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
  2022-11-21 22:56 ` René Scharfe
@ 2022-11-22 18:28 ` Philippe Blain
  2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Philippe Blain @ 2022-11-22 18:28 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: René Scharfe, Git Mailing List

Hi Johannes,

Le 2022-11-21 à 12:58, Johannes Sixt a écrit :
> It is customary to write `A || true` to ignore a potential error exit of
> command A. But when we have a sequence `A && B && C || true && D`, then
> a failure of any of A, B, or C skips to D right away. This is not
> intended here. Turn the command whose failure is to be ignored into a
> compound command to ensure it is the only one that is allowed to fail.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  t/t3920-crlf-messages.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index 4c661d4d54..a58522c163 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -12,7 +12,7 @@ create_crlf_ref () {
>  	cat >.crlf-orig-$branch.txt &&
>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
> +	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>  	test_tick &&
>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
> 

Thank you for making that function more robust.

Philippe.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
  2022-11-21 22:56 ` René Scharfe
  2022-11-22 18:28 ` Philippe Blain
@ 2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
  2022-11-22 22:37   ` Johannes Sixt
  2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-22 22:24 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Philippe Blain, René Scharfe, Git Mailing List


On Mon, Nov 21 2022, Johannes Sixt wrote:

> It is customary to write `A || true` to ignore a potential error exit of
> command A. But when we have a sequence `A && B && C || true && D`, then
> a failure of any of A, B, or C skips to D right away. This is not
> intended here. Turn the command whose failure is to be ignored into a
> compound command to ensure it is the only one that is allowed to fail.

> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  t/t3920-crlf-messages.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index 4c661d4d54..a58522c163 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -12,7 +12,7 @@ create_crlf_ref () {
>  	cat >.crlf-orig-$branch.txt &&
>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
> +	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>  	test_tick &&
>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&

Any reason not to make this:

	-       grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
	+       sed -ne '/Body/p' <.crlf-message-$branch.txt >.crlf-body-$branch.txt &&

?

We usually use that for "grep when we don't care about the exit
code". But maybe some CRLF concerns in this code don't allow it (I only
tested this on *nix).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
@ 2022-11-22 22:37   ` Johannes Sixt
  2022-11-22 22:57     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Sixt @ 2022-11-22 22:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Philippe Blain, René Scharfe, Git Mailing List

Am 22.11.22 um 23:24 schrieb Ævar Arnfjörð Bjarmason:
> On Mon, Nov 21 2022, Johannes Sixt wrote:
>> -	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
>> +	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
> 
> Any reason not to make this:
> 
> 	-       grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
> 	+       sed -ne '/Body/p' <.crlf-message-$branch.txt >.crlf-body-$branch.txt &&
> 
> ?

Yes: I have tested my version, but not this one.

-- Hannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-22 22:37   ` Johannes Sixt
@ 2022-11-22 22:57     ` Ævar Arnfjörð Bjarmason
  2022-11-23  0:55       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-22 22:57 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Philippe Blain, René Scharfe, Git Mailing List


On Tue, Nov 22 2022, Johannes Sixt wrote:

> Am 22.11.22 um 23:24 schrieb Ævar Arnfjörð Bjarmason:
>> On Mon, Nov 21 2022, Johannes Sixt wrote:
>>> -	grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
>>> +	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>> 
>> Any reason not to make this:
>> 
>> 	-       grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true &&
>> 	+       sed -ne '/Body/p' <.crlf-message-$branch.txt >.crlf-body-$branch.txt &&
>> 
>> ?
>
> Yes: I have tested my version, but not this one.

I don't find that too surprising, as unless there's a discussion of "I
tried xyz, but it didn't work, so..." a submitted patch is unlikely to
have tried various alternatives for such a trivial fix, but just gone
with whatever the author tried first.

But in case it wasn't clear, the implied suggestion is that unless there
is a good reason to introduce a pattern of:

	 { <cmd> in >out || true; } &&

That we should use another suitable command with the simpler use of:

	 <cmd2> <in >out &&

If the result is equivalent, as subsequent maintainers won't need to
puzzle over the seemingly odd pattern.

We have that "sed -n -e" in somewhat wide use:

	$ git grep 'sed (-n -e|-ne).*/p.*&&' | wc -l
	54

The only existing occurance of this "grep but ignore the exit code" I
could find was:

	t/t9001-send-email.sh:  { grep '^X-Mailer:' out || :; } >mailer &&

For which we can also:

	-       { grep '^X-Mailer:' out || :; } >mailer &&
	+       sed -ne '/^X-Mailer:/p' <out >mailer &&

And which I'd think would be great to have in a re-roll, i.e. "here's
these two cases where a grep-but-dont-care-about-the-exit-code could be
replaced by sed -ne".

But if re-testing this is tedious etc. I don't mind if it goes in as-is,
but then I'd think we could safely emulate the t9001-send-email.sh
pattern of using ":" instead of "true"; we don't need to invoke another
process just to ignore the exit code.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] t3920: don't ignore errors of more than one command with `|| true`
  2022-11-22 22:57     ` Ævar Arnfjörð Bjarmason
@ 2022-11-23  0:55       ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-11-23  0:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Sixt, Philippe Blain, René Scharfe,
	Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We have that "sed -n -e" in somewhat wide use:
>
> 	$ git grep 'sed (-n -e|-ne).*/p.*&&' | wc -l
> 	54
>
> The only existing occurance of this "grep but ignore the exit code" I
> could find was:
>
> 	t/t9001-send-email.sh:  { grep '^X-Mailer:' out || :; } >mailer &&
>
> For which we can also:
>
> 	-       { grep '^X-Mailer:' out || :; } >mailer &&
> 	+       sed -ne '/^X-Mailer:/p' <out >mailer &&
>
> And which I'd think would be great to have in a re-roll, i.e. "here's
> these two cases where a grep-but-dont-care-about-the-exit-code could be
> replaced by sed -ne".
>
> But if re-testing this is tedious etc. I don't mind if it goes in as-is,
> but then I'd think we could safely emulate the t9001-send-email.sh
> pattern of using ":" instead of "true"; we don't need to invoke another
> process just to ignore the exit code.

Let's leave all of the above (mostly good ideas) for "after the dust
settles" clean-up.

If sed is much slower than grep (for such a small string use case,
start-up cost of a process would dwarf everybody else), "grep || :"
might be justifiable, but other than that I do not think of a good
reason why we might favor "grep || :" offhand over "sed -ne //p".




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/1] t3920: support CR-eating grep
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
                   ` (2 preceding siblings ...)
  2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
@ 2022-12-02 16:51 ` René Scharfe
  2022-12-02 23:14   ` Philippe Blain
                     ` (2 more replies)
  2022-12-02 16:51 ` [PATCH 3/1] t3920: simplify redirection of loop output René Scharfe
  2022-12-02 16:51 ` [PATCH 4/1] t3920: replace two cats with a tee René Scharfe
  5 siblings, 3 replies; 25+ messages in thread
From: René Scharfe @ 2022-12-02 16:51 UTC (permalink / raw)
  To: Johannes Sixt, Philippe Blain; +Cc: Git Mailing List, Junio C Hamano

grep(1) converts CRLF line endings to CR on current MinGW:

   $ uname -sr
   MINGW64_NT-10.0-22621 3.3.6-341.x86_64

   $ printf 'a\r\n' | hexdump.exe -C
   00000000  61 0d 0a                                          |a..|
   00000003

   $ printf 'a\r\n' | grep . | hexdump.exe -C
   00000000  61 0a                                             |a.|
   00000002

Create the intended test file by grepping the original file with LF
line endings and adding CRs explicitly.

The missing CRs went unnoticed because test_cmp on MinGW ignores line
endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
LF <> CRLF conversions, 2013-10-26).  Fix this test anyway to avoid
depending on that special test_cmp behavior, especially since this is
the only test that needs it.

Piping the output of grep(1) through append_cr has the side-effect of
ignoring its return value.  That means we no longer need the explicit
"|| true" to support commit messages without a body.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t3920-crlf-messages.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index a58522c163..67fd2345af 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -12,7 +12,7 @@ create_crlf_ref () {
 	cat >.crlf-orig-$branch.txt &&
 	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
 	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
-	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
+	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
 	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
 	test_tick &&
 	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
--
2.38.1.windows.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/1] t3920: simplify redirection of loop output
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
                   ` (3 preceding siblings ...)
  2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
@ 2022-12-02 16:51 ` René Scharfe
  2022-12-02 16:51 ` [PATCH 4/1] t3920: replace two cats with a tee René Scharfe
  5 siblings, 0 replies; 25+ messages in thread
From: René Scharfe @ 2022-12-02 16:51 UTC (permalink / raw)
  To: Johannes Sixt, Philippe Blain; +Cc: Git Mailing List, junio C Hamano

Redirect the output of for loops once instead of deleting the receiving
file and appending the output of individual commands.  This reduces
repetition and speeds up the test.  Here are the numbers on my Windows
machine before:

   $ (cd t && hyperfine.exe -w3 "sh.exe t3920-crlf-messages.sh")
   Benchmark 1: sh.exe t3920-crlf-messages.sh
     Time (mean ± σ):      5.923 s ±  0.037 s    [User: 0.000 s, System: 0.004 s]
     Range (min … max):    5.889 s …  5.997 s    10 runs

... and with this patch:

   Benchmark 1: sh.exe t3920-crlf-messages.sh
     Time (mean ± σ):      5.705 s ±  0.047 s    [User: 0.000 s, System: 0.001 s]
     Range (min … max):    5.632 s …  5.772 s    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t3920-crlf-messages.sh | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 67fd2345af..4fc9fa9cad 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -68,12 +68,11 @@ test_crlf_subject_body_and_contents() {
 		set ${atoms} && atom=$1 && shift && atoms="$*" &&
 		set ${files} && file=$1 && shift && files="$*" &&
 		test_expect_success "${command}: --format='%${atom}' works with messages using CRLF" "
-			rm -f expect &&
 			for ref in ${LIB_CRLF_BRANCHES}
 			do
-				cat .crlf-${file}-\"\${ref}\".txt >>expect &&
-				printf \"\n\" >>expect || return 1
-			done &&
+				cat .crlf-${file}-\"\${ref}\".txt &&
+				printf \"\n\" || return 1
+			done >expect &&
 			git $command_and_args --format=\"%${atom}\" >actual &&
 			test_cmp expect actual
 		"
@@ -87,13 +86,12 @@ test_expect_success 'Setup refs with commit and tag messages using CRLF' '
 '

 test_expect_success 'branch: --verbose works with messages using CRLF' '
-	rm -f expect &&
 	for branch in $LIB_CRLF_BRANCHES
 	do
-		printf "  " >>expect &&
-		cat .crlf-subject-${branch}.txt >>expect &&
-		printf "\n" >>expect || return 1
-	done &&
+		printf "  " &&
+		cat .crlf-subject-${branch}.txt &&
+		printf "\n" t || return 1
+	done >expect &&
 	git branch -v >tmp &&
 	# Remove first two columns, and the line for the currently checked out branch
 	current=$(git branch --show-current) &&
--
2.38.1.windows.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/1] t3920: replace two cats with a tee
  2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
                   ` (4 preceding siblings ...)
  2022-12-02 16:51 ` [PATCH 3/1] t3920: simplify redirection of loop output René Scharfe
@ 2022-12-02 16:51 ` René Scharfe
  2022-12-03  5:09   ` Eric Sunshine
  5 siblings, 1 reply; 25+ messages in thread
From: René Scharfe @ 2022-12-02 16:51 UTC (permalink / raw)
  To: Johannes Sixt, Philippe Blain; +Cc: Git Mailing List, junio C Hamano

Use tee(1) to replace two calls of cat(1) for writing files with
different line endings.  That's shorter and spawns less processes.

It has a small, but measurable performance impact on my Windows machine.
Here are the numbers before:

   $ (cd t && hyperfine.exe -w3 "sh.exe t3920-crlf-messages.sh")
   Benchmark 1: sh.exe t3920-crlf-messages.sh
     Time (mean ± σ):      5.705 s ±  0.047 s    [User: 0.000 s, System: 0.001 s]
     Range (min … max):    5.632 s …  5.772 s    10 runs

... and with this patch:

   $ (cd t && hyperfine.exe -w3 "sh.exe t3920-crlf-messages.sh")
   Benchmark 1: sh.exe t3920-crlf-messages.sh
     Time (mean ± σ):      5.616 s ±  0.021 s    [User: 0.001 s, System: 0.002 s]
     Range (min … max):    5.577 s …  5.644 s    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t3920-crlf-messages.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 4fc9fa9cad..1f64ce565f 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""

 create_crlf_ref () {
 	branch="$1" &&
-	cat >.crlf-orig-$branch.txt &&
-	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
+	tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
 	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
 	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
 	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
--
2.38.1.windows.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
@ 2022-12-02 23:14   ` Philippe Blain
  2022-12-03  7:09     ` René Scharfe
  2022-12-02 23:32   ` Eric Sunshine
  2022-12-05  1:08   ` Junio C Hamano
  2 siblings, 1 reply; 25+ messages in thread
From: Philippe Blain @ 2022-12-02 23:14 UTC (permalink / raw)
  To: René Scharfe, Johannes Sixt; +Cc: Git Mailing List, Junio C Hamano

Hi René,

Le 2022-12-02 à 11:51, René Scharfe a écrit :
> grep(1) converts CRLF line endings to CR on current MinGW:
> 
>    $ uname -sr
>    MINGW64_NT-10.0-22621 3.3.6-341.x86_64

I don't really know the conventions in this regards, but for me 
"MinGW" refers to the port of GCC to Windows. Here it would be more 
correct to refer to "the Git for Windows flavor of MSYS2" or something
like this, in my (maybe uninformed) opinion.

> 
>    $ printf 'a\r\n' | hexdump.exe -C
>    00000000  61 0d 0a                                          |a..|
>    00000003
> 
>    $ printf 'a\r\n' | grep . | hexdump.exe -C
>    00000000  61 0a                                             |a.|
>    00000002
> 
> Create the intended test file by grepping the original file with LF
> line endings and adding CRs explicitly.
> 
> The missing CRs went unnoticed because test_cmp on MinGW ignores line
> endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
> LF <> CRLF conversions, 2013-10-26). 

Reading that commit's messages, if I understand correctly it's more MSYS2's Bash
that is the culprit, rather than its grep, no?

> Fix this test anyway to avoid
> depending on that special test_cmp behavior, especially since this is
> the only test that needs it.

Do you mean the only test in that file, or in the whole Git test suite (which would
mean after this series we could completely remove the Windows-specific 'test_cmp' ) ?

> 
> Piping the output of grep(1) through append_cr has the side-effect of
> ignoring its return value.  That means we no longer need the explicit
> "|| true" to support commit messages without a body.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t3920-crlf-messages.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index a58522c163..67fd2345af 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -12,7 +12,7 @@ create_crlf_ref () {
>  	cat >.crlf-orig-$branch.txt &&
>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>  	test_tick &&
>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
> --
> 2.38.1.windows.1
> 

apart from the minor things in the commit message, the 3 patches look good to me :) 
Here is my Acked-by: for all 3 :

Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>

Thanks for detailed messages as always, it definitely went over my radar at the time
that both 'grep' and  'test_cmp' acted differently on Windows.

Cheers,
Philippe.
p.s. I wonder what happened for format-patch to generate these 2/1, 3/1 and 4/1 patch prefixes :P

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
  2022-12-02 23:14   ` Philippe Blain
@ 2022-12-02 23:32   ` Eric Sunshine
  2022-12-03  7:12     ` René Scharfe
  2022-12-05  1:08   ` Junio C Hamano
  2 siblings, 1 reply; 25+ messages in thread
From: Eric Sunshine @ 2022-12-02 23:32 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Sixt, Philippe Blain, Git Mailing List, Junio C Hamano

On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
> grep(1) converts CRLF line endings to CR on current MinGW:

Did you mean s/to CR/to LF/ ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-02 16:51 ` [PATCH 4/1] t3920: replace two cats with a tee René Scharfe
@ 2022-12-03  5:09   ` Eric Sunshine
  2022-12-03  8:43     ` René Scharfe
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Sunshine @ 2022-12-03  5:09 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Sixt, Philippe Blain, Git Mailing List, junio C Hamano

On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
> Use tee(1) to replace two calls of cat(1) for writing files with
> different line endings.  That's shorter and spawns less processes.
> [...]
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>  create_crlf_ref () {
> -       cat >.crlf-orig-$branch.txt &&
> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&

This feels slightly magical and more difficult to reason about than
using simple redirection to eliminate the second `cat`. Wouldn't this
work just as well?

    cat >.crlf-orig-$branch.txt &&
    append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&

(Plus, this avoids introducing `tee` into the test suite, more or
less. The few existing instances are all from the same test author and
don't seem particularly legitimate -- they appear to be aids the
author used while developing the test to be able to watch its output
as it ran.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-02 23:14   ` Philippe Blain
@ 2022-12-03  7:09     ` René Scharfe
  0 siblings, 0 replies; 25+ messages in thread
From: René Scharfe @ 2022-12-03  7:09 UTC (permalink / raw)
  To: Philippe Blain, Johannes Sixt; +Cc: Git Mailing List, Junio C Hamano

Am 03.12.22 um 00:14 schrieb Philippe Blain:
> Hi René,
>
> Le 2022-12-02 à 11:51, René Scharfe a écrit :
>> grep(1) converts CRLF line endings to CR on current MinGW:
>>
>>    $ uname -sr
>>    MINGW64_NT-10.0-22621 3.3.6-341.x86_64
>
> I don't really know the conventions in this regards, but for me
> "MinGW" refers to the port of GCC to Windows. Here it would be more
> correct to refer to "the Git for Windows flavor of MSYS2" or something
> like this, in my (maybe uninformed) opinion.

The thing I downloaded and installed is the "Git for Windows SDK", so I
could use that name instead.

>
>>
>>    $ printf 'a\r\n' | hexdump.exe -C
>>    00000000  61 0d 0a                                          |a..|
>>    00000003
>>
>>    $ printf 'a\r\n' | grep . | hexdump.exe -C
>>    00000000  61 0a                                             |a.|
>>    00000002
>>
>> Create the intended test file by grepping the original file with LF
>> line endings and adding CRs explicitly.
>>
>> The missing CRs went unnoticed because test_cmp on MinGW ignores line
>> endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
>> LF <> CRLF conversions, 2013-10-26).
>
> Reading that commit's messages, if I understand correctly it's more MSYS2's Bash
> that is the culprit, rather than its grep, no?

4d715ac05c blames bash for converting LF to CRLF.  Above I blame grep
for converting CRLF to LF.  The experiment to confirm the latter leaves
the possibility that bash somehow inserts a CR eater between grep and
hexdump.  If it does then it is quite selective; e.g. cat passes CRs on
unscathed:

   $ printf 'a\r\nb\n' | grep . | hexdump.exe -C
   00000000  61 0a 62 0a                                       |a.b.|
   00000004

   $ printf 'a\r\nb\n' | cat | hexdump.exe -C
   00000000  61 0d 0a 62 0a                                    |a..b.|
   00000005

>> Fix this test anyway to avoid
>> depending on that special test_cmp behavior, especially since this is
>> the only test that needs it.
>
> Do you mean the only test in that file, or in the whole Git test suite (which would
> mean after this series we could completely remove the Windows-specific 'test_cmp' ) ?

Both.

>>
>> Piping the output of grep(1) through append_cr has the side-effect of
>> ignoring its return value.  That means we no longer need the explicit
>> "|| true" to support commit messages without a body.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  t/t3920-crlf-messages.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>> index a58522c163..67fd2345af 100755
>> --- a/t/t3920-crlf-messages.sh
>> +++ b/t/t3920-crlf-messages.sh
>> @@ -12,7 +12,7 @@ create_crlf_ref () {
>>  	cat >.crlf-orig-$branch.txt &&
>>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
>> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>>  	test_tick &&
>>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
>> --
>> 2.38.1.windows.1
>>
>
> apart from the minor things in the commit message, the 3 patches look good to me :)
> Here is my Acked-by: for all 3 :
>
> Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
>
> Thanks for detailed messages as always, it definitely went over my radar at the time
> that both 'grep' and  'test_cmp' acted differently on Windows.
>
> Cheers,
> Philippe.
> p.s. I wonder what happened for format-patch to generate these 2/1, 3/1 and 4/1 patch prefixes :P

Hand-edited..

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-02 23:32   ` Eric Sunshine
@ 2022-12-03  7:12     ` René Scharfe
  0 siblings, 0 replies; 25+ messages in thread
From: René Scharfe @ 2022-12-03  7:12 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Johannes Sixt, Philippe Blain, Git Mailing List, Junio C Hamano

Am 03.12.22 um 00:32 schrieb Eric Sunshine:
> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>> grep(1) converts CRLF line endings to CR on current MinGW:
>
> Did you mean s/to CR/to LF/ ?

Yes.

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-03  5:09   ` Eric Sunshine
@ 2022-12-03  8:43     ` René Scharfe
  2022-12-03 12:53       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 25+ messages in thread
From: René Scharfe @ 2022-12-03  8:43 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Johannes Sixt, Philippe Blain, Git Mailing List, junio C Hamano

Am 03.12.22 um 06:09 schrieb Eric Sunshine:
> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>> Use tee(1) to replace two calls of cat(1) for writing files with
>> different line endings.  That's shorter and spawns less processes.
>> [...]
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>  create_crlf_ref () {
>> -       cat >.crlf-orig-$branch.txt &&
>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>
> This feels slightly magical and more difficult to reason about than
> using simple redirection to eliminate the second `cat`. Wouldn't this
> work just as well?
>
>     cat >.crlf-orig-$branch.txt &&
>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&

It would work, of course, but this is the exact use case for tee(1).  No
repetition, no extra redirection symbols, just an nicely fitting piece
of pipework.  Don't fear the tee! ;-)

(I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
that PowerShell has a tee command as well.)

> (Plus, this avoids introducing `tee` into the test suite, more or
> less. The few existing instances are all from the same test author and
> don't seem particularly legitimate -- they appear to be aids the
> author used while developing the test to be able to watch its output
> as it ran.)

I agree that the tee calls in t1001 and t5523 are unnecessary.

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-03  8:43     ` René Scharfe
@ 2022-12-03 12:53       ` Ævar Arnfjörð Bjarmason
  2022-12-03 17:22         ` René Scharfe
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-03 12:53 UTC (permalink / raw)
  To: René Scharfe
  Cc: Eric Sunshine, Johannes Sixt, Philippe Blain, Git Mailing List,
	junio C Hamano


On Sat, Dec 03 2022, René Scharfe wrote:

> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>> different line endings.  That's shorter and spawns less processes.
>>> [...]
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>  create_crlf_ref () {
>>> -       cat >.crlf-orig-$branch.txt &&
>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>
>> This feels slightly magical and more difficult to reason about than
>> using simple redirection to eliminate the second `cat`. Wouldn't this
>> work just as well?
>>
>>     cat >.crlf-orig-$branch.txt &&
>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>
> It would work, of course, but this is the exact use case for tee(1).  No
> repetition, no extra redirection symbols, just an nicely fitting piece
> of pipework.  Don't fear the tee! ;-)
>
> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
> that PowerShell has a tee command as well.)

I don't really care, but I must say I agree with Eric here. Not having
surprising patterns in the test suite has a value of its own.

In this case I wonder if you want to optimize this whether we couldn't
do much better with "test_commit_bulk", maybe by teaching it a small set
of new tricks.

I.e. if I do:

	git fast-export --all

At the end of the setup test it seems we just end up with refs with
names that correspond to their contents, and with double newlines in
them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
up with that.

So maybe we can create them as a patch, possibly with some slight "sed"
munging on the input stream, just just teach it to accept a "ref prefix"
and "commit message contents". That could just be an argument that you
"$(printf "...")", so we don't even need a sub-process....

Also this:

     perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x

Isn't deterministic. Now, in this case I doubt it matters, but it's nice
to have intermediate files in the test suite be determanistic, i.e. to
always have the full content be in the file at the top after the "top".

With a "tee" you need to worry about the "append_cr" function it's being
piped in stopping the stdin.

I don't think it matters in this case, but in general as a pattern: I do
fear the "tee" a bit :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-03 12:53       ` Ævar Arnfjörð Bjarmason
@ 2022-12-03 17:22         ` René Scharfe
  2022-12-04  9:34           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 25+ messages in thread
From: René Scharfe @ 2022-12-03 17:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Eric Sunshine, Johannes Sixt, Philippe Blain, Git Mailing List,
	Junio C Hamano

Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Dec 03 2022, René Scharfe wrote:
>
>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>>> different line endings.  That's shorter and spawns less processes.
>>>> [...]
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>>  create_crlf_ref () {
>>>> -       cat >.crlf-orig-$branch.txt &&
>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>
>>> This feels slightly magical and more difficult to reason about than
>>> using simple redirection to eliminate the second `cat`. Wouldn't this
>>> work just as well?
>>>
>>>     cat >.crlf-orig-$branch.txt &&
>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>>
>> It would work, of course, but this is the exact use case for tee(1).  No
>> repetition, no extra redirection symbols, just an nicely fitting piece
>> of pipework.  Don't fear the tee! ;-)
>>
>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
>> that PowerShell has a tee command as well.)
>
> I don't really care, but I must say I agree with Eric here. Not having
> surprising patterns in the test suite has a value of its own.

That's a good general guideline, but I wouldn't have expected a pipe
with three holes to startle anyone. *shrug*

> In this case I wonder if you want to optimize this whether we couldn't
> do much better with "test_commit_bulk", maybe by teaching it a small set
> of new tricks.
>
> I.e. if I do:
>
> 	git fast-export --all
>
> At the end of the setup test it seems we just end up with refs with
> names that correspond to their contents, and with double newlines in
> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
> up with that.
>
> So maybe we can create them as a patch, possibly with some slight "sed"
> munging on the input stream, just just teach it to accept a "ref prefix"
> and "commit message contents". That could just be an argument that you
> "$(printf "...")", so we don't even need a sub-process....

The files are used later for verification, so their contents can't just
be passed on via parameters.

Had a similar idea and spent too much time on creating the four files in
a single awk invocation.  The code was too verbose and yet hard to read
for my taste.

> Also this:
>
>      perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x
>
> Isn't deterministic. Now, in this case I doubt it matters, but it's nice
> to have intermediate files in the test suite be determanistic, i.e. to
> always have the full content be in the file at the top after the "top".

Whoa, such a one-liner is a good argument for banishing Perl.

So to rephrase it in a way that I can understand, you say that something
like this:

	$ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x

... will probably report less than 100000 lines because the downpipe
command ends the whole thing early.

> With a "tee" you need to worry about the "append_cr" function it's being
> piped in stopping the stdin.
>
> I don't think it matters in this case, but in general as a pattern: I do
> fear the "tee" a bit :)

Right, append_cr reads until EOF.

René


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-03 17:22         ` René Scharfe
@ 2022-12-04  9:34           ` Ævar Arnfjörð Bjarmason
  2022-12-04 16:39             ` Eric Sunshine
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-04  9:34 UTC (permalink / raw)
  To: René Scharfe
  Cc: Eric Sunshine, Johannes Sixt, Philippe Blain, Git Mailing List,
	Junio C Hamano


On Sat, Dec 03 2022, René Scharfe wrote:

> Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Dec 03 2022, René Scharfe wrote:
>>
>>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>>>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>>>> different line endings.  That's shorter and spawns less processes.
>>>>> [...]
>>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>>> ---
>>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>>>  create_crlf_ref () {
>>>>> -       cat >.crlf-orig-$branch.txt &&
>>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>
>>>> This feels slightly magical and more difficult to reason about than
>>>> using simple redirection to eliminate the second `cat`. Wouldn't this
>>>> work just as well?
>>>>
>>>>     cat >.crlf-orig-$branch.txt &&
>>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>>>
>>> It would work, of course, but this is the exact use case for tee(1).  No
>>> repetition, no extra redirection symbols, just an nicely fitting piece
>>> of pipework.  Don't fear the tee! ;-)
>>>
>>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
>>> that PowerShell has a tee command as well.)
>>
>> I don't really care, but I must say I agree with Eric here. Not having
>> surprising patterns in the test suite has a value of its own.
>
> That's a good general guideline, but I wouldn't have expected a pipe
> with three holes to startle anyone. *shrug*

It's more that you're used to seeing one thing, the "cat >in" at the
start of a function is a common pattern.

Then it takes some time to stop and grok an a new pattern. If I was
hacking on a function like that I'd probably stop to try to understand
"why", even though I understood the "what".

I'd then find it was to try to optimize things on Windows a bit... :)

I'm not saying it's not worth it in this case, just pointing out that
boring "standard" patterns have a value of their own in us collectively
understanding them, which has a value of its own. Whether optimizing a
test case outweighs that is another matter (sometimes it would).

>> In this case I wonder if you want to optimize this whether we couldn't
>> do much better with "test_commit_bulk", maybe by teaching it a small set
>> of new tricks.
>>
>> I.e. if I do:
>>
>> 	git fast-export --all
>>
>> At the end of the setup test it seems we just end up with refs with
>> names that correspond to their contents, and with double newlines in
>> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
>> up with that.
>>
>> So maybe we can create them as a patch, possibly with some slight "sed"
>> munging on the input stream, just just teach it to accept a "ref prefix"
>> and "commit message contents". That could just be an argument that you
>> "$(printf "...")", so we don't even need a sub-process....
>
> The files are used later for verification, so their contents can't just
> be passed on via parameters.
>
> Had a similar idea and spent too much time on creating the four files in
> a single awk invocation.  The code was too verbose and yet hard to read
> for my taste.

Hah, I didn't try. Just a suggestion in case it made sense :)

>> Also this:
>>
>>      perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x
>>
>> Isn't deterministic. Now, in this case I doubt it matters, but it's nice
>> to have intermediate files in the test suite be determanistic, i.e. to
>> always have the full content be in the file at the top after the "top".
>
> Whoa, such a one-liner is a good argument for banishing Perl.
>
> So to rephrase it in a way that I can understand, you say that something
> like this:
>
> 	$ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x
>
> ... will probably report less than 100000 lines because the downpipe
> command ends the whole thing early.

Yes, the "perl" line was just a quick demo hack.

But the point is that the initial perl process on the LHS will be killed
with a SIGPIPE as the "perl" on the RHS stops and a SIGPIPE is
propagated up the chain.

I don't think it matters in this case, but just pointing out that it
*is* an edge case this sort of pattern introduces.

I've sometimes resorted to recursively diffing the trash directories of
two test runs to see if they're the same. E.g. I've caught cases where
the stderr of programs unexpectedly changes, but we had no test coverage
for it.

I think it's good to avoid patterns in general that make test runs
nondeterministic.

In this case it's only nondeterministic on failure, so it's probably
fine.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/1] t3920: replace two cats with a tee
  2022-12-04  9:34           ` Ævar Arnfjörð Bjarmason
@ 2022-12-04 16:39             ` Eric Sunshine
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Sunshine @ 2022-12-04 16:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Johannes Sixt, Philippe Blain,
	Git Mailing List, Junio C Hamano

On Sun, Dec 4, 2022 at 4:41 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Sat, Dec 03 2022, René Scharfe wrote:
> > Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
> >> On Sat, Dec 03 2022, René Scharfe wrote:
> >>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
> >>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
> >>>>> -       cat >.crlf-orig-$branch.txt &&
> >>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
> >>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
> >>>>
> >>>> This feels slightly magical and more difficult to reason about than
> >>>> using simple redirection to eliminate the second `cat`. Wouldn't this
> >>>> work just as well?
> >>>>
> >>>>     cat >.crlf-orig-$branch.txt &&
> >>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
> >>>
> >>> It would work, of course, but this is the exact use case for tee(1).  No
> >>> repetition, no extra redirection symbols, just an nicely fitting piece
> >>> of pipework.  Don't fear the tee! ;-)
> >>
> >> I don't really care, but I must say I agree with Eric here. Not having
> >> surprising patterns in the test suite has a value of its own.
> >
> > That's a good general guideline, but I wouldn't have expected a pipe
> > with three holes to startle anyone. *shrug*
>
> It's more that you're used to seeing one thing, the "cat >in" at the
> start of a function is a common pattern.
>
> Then it takes some time to stop and grok an a new pattern. If I was
> hacking on a function like that I'd probably stop to try to understand
> "why", even though I understood the "what".
>
> I'm not saying it's not worth it in this case, just pointing out that
> boring "standard" patterns have a value of their own in us collectively
> understanding them, which has a value of its own. Whether optimizing a
> test case outweighs that is another matter (sometimes it would).

Perhaps my experience is atypical, but in decades of using Unix, my
use of `tee` can (probably) be counted on a single finger, so the
patch, as implemented, did have higher cognitive load for me than a
patch using simple redirection would have had. Anyhow, I mentioned the
redirection approach, not to ask for a change, but only in case you
had overlooked the (to me) simpler approach. I didn't expect it to
spark so much discussion (though I do agree with everything Ævar has
said about following established patterns).

That said, I'm still rather unclear on the purpose of this patch. In a
sense, it feels like mere churn for 1/100 of a second gain (assuming
I'm reading the `hyperfine` output correctly).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
  2022-12-02 23:14   ` Philippe Blain
  2022-12-02 23:32   ` Eric Sunshine
@ 2022-12-05  1:08   ` Junio C Hamano
  2022-12-05  8:28     ` René Scharfe
  2 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2022-12-05  1:08 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Philippe Blain, Git Mailing List

René Scharfe <l.s.r@web.de> writes:

>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&

Doesn't append_cr unconditionally adds CR at the end?  Do we need to
touch this test again when "grep" gets fixed on the platform?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-05  1:08   ` Junio C Hamano
@ 2022-12-05  8:28     ` René Scharfe
  2022-12-05  9:32       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: René Scharfe @ 2022-12-05  8:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Philippe Blain, Git Mailing List

Am 05.12.22 um 02:08 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
>> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>
> Doesn't append_cr unconditionally adds CR at the end?

It does.

> Do we need to
> touch this test again when "grep" gets fixed on the platform?

Depends on the meaning of "fixed".  If it stops removing CRs then this
line is unaffected -- .crlf-orig-$branch.txt contains no CRs.  If it
starts adding CRs then we'd have a problem with all grep invocations,
which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
to random LF <> CRLF conversions, 2013-10-26).

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-05  8:28     ` René Scharfe
@ 2022-12-05  9:32       ` Junio C Hamano
  2022-12-05 10:43         ` René Scharfe
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2022-12-05  9:32 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Philippe Blain, Git Mailing List

René Scharfe <l.s.r@web.de> writes:

> Depends on the meaning of "fixed".  If it stops removing CRs then this
> line is unaffected -- .crlf-orig-$branch.txt contains no CRs.

OK, that "fix" was what I was worried about and if there is no
problem with the input we use, that is good ;-)

> If it
> starts adding CRs then we'd have a problem with all grep invocations,
> which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
> to random LF <> CRLF conversions, 2013-10-26).

Yeah, but I thought that the unspoken motivation behind recent
changes are so that we do not have to rely on "the differences
between CRLF and LF do not matter" version of test_cmp?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/1] t3920: support CR-eating grep
  2022-12-05  9:32       ` Junio C Hamano
@ 2022-12-05 10:43         ` René Scharfe
  0 siblings, 0 replies; 25+ messages in thread
From: René Scharfe @ 2022-12-05 10:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Philippe Blain, Git Mailing List

Am 05.12.22 um 10:32 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Depends on the meaning of "fixed".  If it stops removing CRs then this
>> line is unaffected -- .crlf-orig-$branch.txt contains no CRs.
>
> OK, that "fix" was what I was worried about and if there is no
> problem with the input we use, that is good ;-)
>
>> If it
>> starts adding CRs then we'd have a problem with all grep invocations,
>> which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
>> to random LF <> CRLF conversions, 2013-10-26).
>
> Yeah, but I thought that the unspoken motivation behind recent
> changes are so that we do not have to rely on "the differences
> between CRLF and LF do not matter" version of test_cmp?

The patch currently enables the removal of mingw_test_cmp; its commit
message mentions it in passing ("[...] especially since this is the only
test that needs it.").  If grep (or bash) would be "fixed" to add CRs
then we'd have to deal with that damage e.g. by keeping mingw_test_cmp.

René

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-05 10:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
2022-11-21 22:56 ` René Scharfe
2022-11-22  0:53   ` Junio C Hamano
2022-11-22 18:28 ` Philippe Blain
2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
2022-11-22 22:37   ` Johannes Sixt
2022-11-22 22:57     ` Ævar Arnfjörð Bjarmason
2022-11-23  0:55       ` Junio C Hamano
2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
2022-12-02 23:14   ` Philippe Blain
2022-12-03  7:09     ` René Scharfe
2022-12-02 23:32   ` Eric Sunshine
2022-12-03  7:12     ` René Scharfe
2022-12-05  1:08   ` Junio C Hamano
2022-12-05  8:28     ` René Scharfe
2022-12-05  9:32       ` Junio C Hamano
2022-12-05 10:43         ` René Scharfe
2022-12-02 16:51 ` [PATCH 3/1] t3920: simplify redirection of loop output René Scharfe
2022-12-02 16:51 ` [PATCH 4/1] t3920: replace two cats with a tee René Scharfe
2022-12-03  5:09   ` Eric Sunshine
2022-12-03  8:43     ` René Scharfe
2022-12-03 12:53       ` Ævar Arnfjörð Bjarmason
2022-12-03 17:22         ` René Scharfe
2022-12-04  9:34           ` Ævar Arnfjörð Bjarmason
2022-12-04 16:39             ` Eric Sunshine

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).