git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	Jeff Hostetler via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH] t/perf/perf-lib.sh: remove test_times.* at the end test_perf_()
Date: Thu, 7 Oct 2021 22:55:18 -0400	[thread overview]
Message-ID: <YV+zFqi4VmBVJYex@coredump.intra.peff.net> (raw)
In-Reply-To: <3f03ed89-d3db-32ba-3c1f-b8fac7cfb097@jeffhostetler.com>

On Thu, Oct 07, 2021 at 01:49:15PM -0400, Jeff Hostetler wrote:

> Yeah, I don't think I want to keep switching the value of _REPEAT_COUNT
> in the body of the test.  (It did feel a little "against the spirit" of
> the framework.)  I'm in the process of redoing the test to not need
> that.

Sounds good to me. :)

> There's a problem with the perf test assumptions here and I'm curious
> if there's a better way to use the perf-lib that I'm not thinking of.
> 
> When working with big repos (in this case 100K files), the actual
> checkout takes 33 seconds, but the repetitions are fast -- since they
> just print a warning and stop.  In the 1M file case that number is ~7
> minutes for the first instance.)  With the code in min_time.perl
> silently taking the min() of the runs, it looks like the checkout was
> really fast when it wasn't.  That fact gets hidden in the summary report
> printed at the end.

Right. So your option now is basically something like:

  test_perf 'checkout' '
	git reset --hard the_original_state &&
	git checkout
  '

I.e., reset the state _and_ perform the operation you want to time,
within a single test_perf(). (Here it's a literal "git reset", but
obviously it could be anything to adjust the state back to the
baseline). But that reset operation will get lumped into your timing.
That might not matter if it's comparatively cheap, but it might throw
off all your results.

What I'd propose instead is that we ought to have:

  test_perf 'checkout'
            --prepare '
	        git reset --hard the_original_state
	    ' '
	        git checkout
	    '

Having two multi-line snippets is a bit ugly (check out that awful
middle line), but I think this could be added without breaking existing
tests (they just wouldn't have a --prepare option).

If that syntax is too horrendous, we could have:

  # this saves the snippet in a variable internally, and runs
  # it before each trial of the next test_perf(), after which
  # it is discarded
  test_perf_prepare '
          git reset --hard the_original_state
  '

  test_perf 'checkout' '
          git checkout
  '

I think that would be pretty easy to implement, and would solve the most
common form of this problem. And there's plenty of prior art; just about
every decent benchmarking system has a "do this before each trial"
mechanism. Our t/perf suite (as you probably noticed) is rather more
ad-hoc and less mature.

There are cases it doesn't help, though. For instance, in one of the
scripts we measure the time to run "git repack -adb" to generate
bitmaps. But the first run has to do more work, because we can reuse
results for subsequent ones! It would help to "rm -f
objects/pack/*.bitmap", but even that's not entirely fair, as it will be
repacking from a single pack, versus whatever state we started with.

And there I think the whole "take the best run" strategy is hampering
us. These inaccuracies in our timings go unseen, because we don't do any
statistical analysis of the results. Whereas a tool like hyperfine (for
example) will run trials until the mean stabilizes, and then let you
know when there were trials outside of a standard deviation.

I know we're hesitant to introduce dependencies, but I do wonder if we
could have much higher quality perf results if we accepted a dependency
on a tool like that. I'd never want that for the regular test suite, but
I'd my feelings for the perf suite are much looser. I suspect not many
people run it at all, and its main utility is showing off improvements
and looking for broad regressions. It's possible somebody would want to
track down a performance change on a specific obscure platform, but in
general I'd suspect they'd be much better off timing things manually in
such a case.

So there. That was probably more than you wanted to hear, and further
than you want to go right now. In the near-term for the tests you're
interested in, something like the "prepare" feature I outlined above
would probably not be too hard to add, and would address your immediate
problem.

-Peff

  reply	other threads:[~2021-10-08  2:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04 22:29 [PATCH] t/perf/perf-lib.sh: remove test_times.* at the end test_perf_() Jeff Hostetler via GitGitGadget
2021-10-05 17:45 ` Taylor Blau
2021-10-06 19:24   ` Jeff King
2021-10-06 19:26     ` Taylor Blau
2021-10-07 17:49     ` Jeff Hostetler
2021-10-08  2:55       ` Jeff King [this message]
2021-10-08  7:47         ` A hard dependency on "hyperfine" for t/perf Ævar Arnfjörð Bjarmason
2021-10-08 17:30         ` [PATCH] t/perf/perf-lib.sh: remove test_times.* at the end test_perf_() Junio C Hamano
2021-10-08 19:57           ` Jeff King
2021-10-10 21:26   ` SZEDER Gábor
2021-10-13 21:09     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YV+zFqi4VmBVJYex@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhost@microsoft.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).