git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Johannes Sixt <j6t@kdbg.org>,
	Philippe Blain <levraiphilippeblain@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 4/1] t3920: replace two cats with a tee
Date: Sun, 04 Dec 2022 10:34:39 +0100	[thread overview]
Message-ID: <221204.86cz8zecam.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <491ad25c-1cf3-98dd-f7aa-e8d1f24c8cd0@web.de>


On Sat, Dec 03 2022, René Scharfe wrote:

> Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Dec 03 2022, René Scharfe wrote:
>>
>>> Am 03.12.22 um 06:09 schrieb Eric Sunshine:
>>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>>>>> Use tee(1) to replace two calls of cat(1) for writing files with
>>>>> different line endings.  That's shorter and spawns less processes.
>>>>> [...]
>>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>>> ---
>>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES=""
>>>>>  create_crlf_ref () {
>>>>> -       cat >.crlf-orig-$branch.txt &&
>>>>> -       cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>> +       tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>>>
>>>> This feels slightly magical and more difficult to reason about than
>>>> using simple redirection to eliminate the second `cat`. Wouldn't this
>>>> work just as well?
>>>>
>>>>     cat >.crlf-orig-$branch.txt &&
>>>>     append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt &&
>>>
>>> It would work, of course, but this is the exact use case for tee(1).  No
>>> repetition, no extra redirection symbols, just an nicely fitting piece
>>> of pipework.  Don't fear the tee! ;-)
>>>
>>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command)
>>> that PowerShell has a tee command as well.)
>>
>> I don't really care, but I must say I agree with Eric here. Not having
>> surprising patterns in the test suite has a value of its own.
>
> That's a good general guideline, but I wouldn't have expected a pipe
> with three holes to startle anyone. *shrug*

It's more that you're used to seeing one thing, the "cat >in" at the
start of a function is a common pattern.

Then it takes some time to stop and grok an a new pattern. If I was
hacking on a function like that I'd probably stop to try to understand
"why", even though I understood the "what".

I'd then find it was to try to optimize things on Windows a bit... :)

I'm not saying it's not worth it in this case, just pointing out that
boring "standard" patterns have a value of their own in us collectively
understanding them, which has a value of its own. Whether optimizing a
test case outweighs that is another matter (sometimes it would).

>> In this case I wonder if you want to optimize this whether we couldn't
>> do much better with "test_commit_bulk", maybe by teaching it a small set
>> of new tricks.
>>
>> I.e. if I do:
>>
>> 	git fast-export --all
>>
>> At the end of the setup test it seems we just end up with refs with
>> names that correspond to their contents, and with double newlines in
>> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end
>> up with that.
>>
>> So maybe we can create them as a patch, possibly with some slight "sed"
>> munging on the input stream, just just teach it to accept a "ref prefix"
>> and "commit message contents". That could just be an argument that you
>> "$(printf "...")", so we don't even need a sub-process....
>
> The files are used later for verification, so their contents can't just
> be passed on via parameters.
>
> Had a similar idea and spent too much time on creating the four files in
> a single awk invocation.  The code was too verbose and yet hard to read
> for my taste.

Hah, I didn't try. Just a suggestion in case it made sense :)

>> Also this:
>>
>>      perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x
>>
>> Isn't deterministic. Now, in this case I doubt it matters, but it's nice
>> to have intermediate files in the test suite be determanistic, i.e. to
>> always have the full content be in the file at the top after the "top".
>
> Whoa, such a one-liner is a good argument for banishing Perl.
>
> So to rephrase it in a way that I can understand, you say that something
> like this:
>
> 	$ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x
>
> ... will probably report less than 100000 lines because the downpipe
> command ends the whole thing early.

Yes, the "perl" line was just a quick demo hack.

But the point is that the initial perl process on the LHS will be killed
with a SIGPIPE as the "perl" on the RHS stops and a SIGPIPE is
propagated up the chain.

I don't think it matters in this case, but just pointing out that it
*is* an edge case this sort of pattern introduces.

I've sometimes resorted to recursively diffing the trash directories of
two test runs to see if they're the same. E.g. I've caught cases where
the stderr of programs unexpectedly changes, but we had no test coverage
for it.

I think it's good to avoid patterns in general that make test runs
nondeterministic.

In this case it's only nondeterministic on failure, so it's probably
fine.

  reply	other threads:[~2022-12-04  9:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 17:58 [PATCH] t3920: don't ignore errors of more than one command with `|| true` Johannes Sixt
2022-11-21 22:56 ` René Scharfe
2022-11-22  0:53   ` Junio C Hamano
2022-11-22 18:28 ` Philippe Blain
2022-11-22 22:24 ` Ævar Arnfjörð Bjarmason
2022-11-22 22:37   ` Johannes Sixt
2022-11-22 22:57     ` Ævar Arnfjörð Bjarmason
2022-11-23  0:55       ` Junio C Hamano
2022-12-02 16:51 ` [PATCH 2/1] t3920: support CR-eating grep René Scharfe
2022-12-02 23:14   ` Philippe Blain
2022-12-03  7:09     ` René Scharfe
2022-12-02 23:32   ` Eric Sunshine
2022-12-03  7:12     ` René Scharfe
2022-12-05  1:08   ` Junio C Hamano
2022-12-05  8:28     ` René Scharfe
2022-12-05  9:32       ` Junio C Hamano
2022-12-05 10:43         ` René Scharfe
2022-12-02 16:51 ` [PATCH 3/1] t3920: simplify redirection of loop output René Scharfe
2022-12-02 16:51 ` [PATCH 4/1] t3920: replace two cats with a tee René Scharfe
2022-12-03  5:09   ` Eric Sunshine
2022-12-03  8:43     ` René Scharfe
2022-12-03 12:53       ` Ævar Arnfjörð Bjarmason
2022-12-03 17:22         ` René Scharfe
2022-12-04  9:34           ` Ævar Arnfjörð Bjarmason [this message]
2022-12-04 16:39             ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221204.86cz8zecam.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=l.s.r@web.de \
    --cc=levraiphilippeblain@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).