Hi Victoria,

[sorry for the long delay, I was being pulled into too many directions,
but am incredibly grateful for your fantastic investigation.]

On Thu, 24 Mar 2022, Victoria Dye wrote:

> Johannes Schindelin via GitGitGadget wrote:
> > Changes since v1:
> >
> >  * In the patch that removed MAKE_TARGETS, a stale comment about that
> >    variable is also removed.
> >  * The comment about set -x has been adjusted because it no longer applies
> >    as-is.
> >  * The commit message of "ci: make it easier to find failed tests' logs in
> >    the GitHub workflow" has been adjusted to motivate the improvement
> >    better.
> >
> >
> > Background
> > ==========
> >
> > Recent patches intended to help readers figure out CI failures much quicker
> > than before. Unfortunately, they haven't been entirely positive for me. For
> > example, they broke the branch protections in Microsoft's fork of Git, where
> > we require Pull Requests to pass a certain set of Checks (which are
> > identified by their names) and therefore caused follow-up work.
> >
> > Using CI and in general making it easier for new contributors is an area I'm
> > passionate about, and one I'd like to see improved.
> >
> >
> > The current situation
> > =====================
> >
> > Let me walk you through the current experience when a PR build fails: I get
> > a notification mail that only says that a certain job failed. There's no
> > indication of which test failed (or was it the build?). I can click on a
> > link at it takes me to the workflow run. Once there, all it says is "Process
> > completed with exit code 1", or even "code 2". Sure, I can click on one of
> > the failed jobs. It even expands the failed step's log (collapsing the other
> > steps). And what do I see there?
> >
> > Let's look at an example of a failed linux-clang (ubuntu-latest) job
> > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
> >
> > [...]
> > Test Summary Report
> > -------------------
> > t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
> >   Failed test:  49
> >   Non-zero exit status: 1
> > t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
> >   TODO passed:   45, 47
> > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
> > Result: FAIL
> > make[1]: *** [Makefile:53: prove] Error 1
> > make[1]: Leaving directory '/home/runner/work/git/git/t'
> > make: *** [Makefile:3018: test] Error 2
> >
> >
> > That's it. I count myself lucky not to be a new contributor being faced with
> > something like this.
> >
> > Now, since I am active in the Git project for a couple of days or so, I can
> > make sense of the "TODO passed" label and know that for the purpose of
> > fixing the build failures, I need to ignore this, and that I need to focus
> > on the "Failed test" part instead.
> >
> > I also know that I do not have to get myself an ubuntu-latest box just to
> > reproduce the error, I do not even have to check out the code and run it
> > just to learn what that "49" means.
> >
> > I know, and I do not expect any new contributor, not even most seasoned
> > contributors to know, that I have to patiently collapse the "Run
> > ci/run-build-and-tests.sh" job's log, and instead expand the "Run
> > ci/print-test-failures.sh" job log (which did not fail and hence does not
> > draw any attention to it).
> >
>
> You can count me as one of the ones that, until this series, had no idea
> this information existed in the CI logs. And it definitely would have been
> helpful in figuring out some of the OS-specific bugs I've run into in the
> past. :)

Excellent. I think I have enough accounts to consider it a universal truth
that our CI output _needs_ something like this here patch series.

> > I know, and again: I do not expect many others to know this, that I then
> > have to click into the "Search logs" box (not the regular web browser's
> > search via Ctrl+F!) and type in "not ok" to find the log of the failed test
> > case (and this might still be a "known broken" one that is marked via
> > test_expect_failure and once again needs to be ignored).
> >
> > To be excessively clear: This is not a great experience!
> >
> >
> > Improved output
> > ===============
> >
> > Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
> > even showed flaky tests, and trends e.g. how long the test cases ran. When I
> > ported Git's CI over to GitHub workflows (to make CI more accessible to new
> > contributors), I knew fully well that we would leave this very nice UI
> > behind, and I had hoped that we would get something similar back via new,
> > community-contributed GitHub Actions that can be used in GitHub workflows.
> > However, most likely because we use a home-grown test framework implemented
> > in opinionated POSIX shells scripts, that did not happen.
> >
> > So I had a look at what standards exist e.g. when testing PowerShell
> > modules, in the way of marking up their test output in GitHub workflows, and
> > I was not disappointed: GitHub workflows support "grouping" of output lines,
> > i.e. marking sections of the output as a group that is then collapsed by
> > default and can be expanded. And it is this feature I decided to use in this
> > patch series, along with GitHub workflows' commands to display errors or
> > notices that are also shown on the summary page of the workflow run. Now, in
> > addition to "Process completed with exit code" on the summary page, we also
> > read something like:
> >
> > ⊗ linux-gcc (ubuntu-latest)
> >    failed: t9800.20 submit from detached head
> >
> >
> > Even better, this message is a link, and following that, the reader is
> > presented with something like this
> > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:
> >
> > ⏵ Run ci/run-build-and-tests.sh
> > ⏵ CI setup
> >   + ln -s /home/runner/none/.prove t/.prove
> >   + run_tests=t
> >   + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> >   + group Build make
> >   + set +x
> > ⏵ Build
> > ⏵ Run tests
> >   === Failed test: t9800-git-p4-basic ===
> > ⏵ ok: t9800.1 start p4d
> > ⏵ ok: t9800.2 add p4 files
> > ⏵ ok: t9800.3 basic git p4 clone
> > ⏵ ok: t9800.4 depot typo error
> > ⏵ ok: t9800.5 git p4 clone @all
> > ⏵ ok: t9800.6 git p4 sync uninitialized repo
> > ⏵ ok: t9800.7 git p4 sync new branch
> > ⏵ ok: t9800.8 clone two dirs
> > ⏵ ok: t9800.9 clone two dirs, @all
> > ⏵ ok: t9800.10 clone two dirs, @all, conflicting files
> > ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
> > ⏵ ok: t9800.12 clone using non-numeric revision ranges
> > ⏵ ok: t9800.13 clone with date range, excluding some changes
> > ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
> > ⏵ ok: t9800.15 exit gracefully for p4 server errors
> > ⏵ ok: t9800.16 clone --bare should make a bare repository
> > ⏵ ok: t9800.17 initial import time from top change time
> > ⏵ ok: t9800.18 unresolvable host in P4PORT should display error
> > ⏵ ok: t9800.19 run hook p4-pre-submit before submit
> >   Error: failed: t9800.20 submit from detached head
> > ⏵ failure: t9800.20 submit from detached head
> >   Error: failed: t9800.21 submit from worktree
> > ⏵ failure: t9800.21 submit from worktree
> >   === Failed test: t9801-git-p4-branch ===
> >   [...]
> >
> >
> > The "Failed test:" lines are colored in yellow to give a better visual clue
> > about the logs' structure, the "Error:" label is colored in red to draw the
> > attention to the important part of the log, and the "⏵" characters indicate
> > that part of the log is collapsed and can be expanded by clicking on it.
> >
> > To drill down, the reader merely needs to expand the (failed) test case's
> > log by clicking on it, and then study the log. If needed (e.g. when the test
> > case relies on side effects from previous test cases), the logs of preceding
> > test cases can be expanded as well. In this example, when expanding
> > t9800.20, it looks like this (for ease of reading, I cut a few chunks of
> > lines, indicated by "[...]"):
> >
> > [...]
> > ⏵ ok: t9800.19 run hook p4-pre-submit before submit
> >   Error: failed: t9800.20 submit from detached head
> > ⏷ failure: t9800.20 submit from detached head
> >       test_when_finished cleanup_git &&
> >       git p4 clone --dest="$git" //depot &&
> >         (
> >           cd "$git" &&
> >           git checkout p4/master &&
> >           >detached_head_test &&
> >           git add detached_head_test &&
> >           git commit -m "add detached_head" &&
> >           git config git-p4.skipSubmitEdit true &&
> >           git p4 submit &&
> >             git p4 rebase &&
> >             git log p4/master | grep detached_head
> >         )
> >     [...]
> >     Depot paths: //depot/
> >     Import destination: refs/remotes/p4/master
> >
> >     Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
> >     Perforce db files in '.' will be created if missing...
> >
> >     Traceback (most recent call last):
> >       File "/home/runner/work/git/git/git-p4", line 4455, in <module>
> >         main()
> >       File "/home/runner/work/git/git/git-p4", line 4449, in main
> >         if not cmd.run(args):
> >       File "/home/runner/work/git/git/git-p4", line 2590, in run
> >         rebase.rebase()
> >       File "/home/runner/work/git/git/git-p4", line 4121, in rebase
> >         if len(read_pipe("git diff-index HEAD --")) > 0:
> >       File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
> >         retcode, out, err = read_pipe_full(c, *k, **kw)
> >       File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
> >         p = subprocess.Popen(
> >       File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
> >         self._execute_child(args, executable, preexec_fn, close_fds,
> >       File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
> >         raise child_exception_type(errno_num, err_msg, err_filename)
> >     FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
> >     error: last command exited with $?=1
> >     + cleanup_git
> >     + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + nr_tries_left=60
> >     + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + test 1 -ne 1
> >     + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + nr_tries_left=60
> >     + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
> >     + exit 1
> >     + eval_ret=1
> >     + :
> >     not ok 20 - submit from detached head
> >     #
> >     #        test_when_finished cleanup_git &&
> >     #        git p4 clone --dest="$git" //depot &&
> >     #        (
> >     #            cd "$git" &&
> >     #            git checkout p4/master &&
> >     #            >detached_head_test &&
> >     #            git add detached_head_test &&
> >     #            git commit -m "add detached_head" &&
> >     #            git config git-p4.skipSubmitEdit true &&
> >     #            git p4 submit &&
> >     #            git p4 rebase &&
> >     #            git log p4/master | grep detached_head
> >     #        )
> >     #
> >   Error: failed: t9800.21 submit from worktree
> >   [...]
> >
> >
> > Is this the best UI we can have for test failures in CI runs? I hope we can
> > do better. Having said that, this patch series presents a pretty good start,
> > and offers a basis for future improvements.
> >
>
> I think these are really valuable improvements over our current state, but I
> also understand the concerns about performance elsewhere in this thread
> (it's really slow to load for me as well, and scrolling/expanding the log
> groups can be a bit glitchy in my browser). That said, I think there are a
> couple ways you could improve the load time without sacrificing the (very
> helpful) improvements you've made to error log visibility. I experimented a
> bit (example result [1]) and came up with some things that may help:
>
> * group errors by test file, rather than by test case (to reduce
>   parsing/rendering time for lots of groups).

I really would like to avoid that, based on my past experience with
diagnosing test failures. It is definitely helpful if the structure lets
the reader expand individual test cases.

> * print the verbose logs only for the failed test cases (to massively cut
>   down on the size of the log, particularly when there's only a couple
>   failures in a test file with a lot of passing tests).

That's an amazingly simple trick to improve the speed by a ton, indeed.
Thank you for this splendid idea!

> * skip printing the full text of the test in 'finalize_test_case_output'
>   when creating the group, i.e., use '$1' instead of '$*' (in both passing
>   and failing tests, this information is already printed via some other
>   means).
>
> If you wanted to make sure a user could still access the full failure logs
> (i.e., including the "ok" test results), you could print a link to the
> artifacts page as well - that way, all of the information we currently
> provide to users can still be found somewhere.

That's a good point, I added that hint to the output (the link is
unfortunately not available at the time we print that advice).

>
> [1] https://github.com/vdye/git/runs/5666973267
>
> > Johannes Schindelin (9):
> >   ci: fix code style
> >   ci/run-build-and-tests: take a more high-level view
> >   ci: make it easier to find failed tests' logs in the GitHub workflow
> >   ci/run-build-and-tests: add some structure to the GitHub workflow
> >     output
> >   tests: refactor --write-junit-xml code
> >   test(junit): avoid line feeds in XML attributes
> >   ci: optionally mark up output in the GitHub workflow
> >   ci: use `--github-workflow-markup` in the GitHub workflow
> >   ci: call `finalize_test_case_output` a little later
> >
>
> The organization of these commits makes the series a bit confusing to read,
> mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to
> be dependent on patches 1-4, so they could be moved to the beginning (after
> patch 1). With that change, I think this series would flow more smoothly:
> "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX
> improvements".

Great feedback! I changed the order as suggested.

Again, thank you so much for helping me improve the user experience of
Git's CI/PR builds.

Ciao,
Dscho