git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>,
	Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Derrick Stolee <dstolee@microsoft.com>, git@vger.kernel.org
Subject: Re: t7900's new expensive test
Date: Tue, 1 Dec 2020 15:55:00 -0500	[thread overview]
Message-ID: <373f3dfe-828b-430d-b88e-5e23302090cb@gmail.com> (raw)
In-Reply-To: <X8YrbDpC9/EjRr95@coredump.intra.peff.net>

On 12/1/2020 6:39 AM, Jeff King wrote:
> On Tue, Dec 01, 2020 at 06:23:28AM -0500, Jeff King wrote:
> 
>> I'm not sure if EXPENSIVE is the right ballpark, or if we'd want a
>> VERY_EXPENSIVE. On my machine, the whole test suite for v2.29.0 takes 64
>> seconds to run, and setting GIT_TEST_LONG=1 bumps that to 103s. It got a
>> bit worse since then, as t7900 adds an EXPENSIVE test that takes ~200s
>> (it's not strictly additive, since we can work in parallel on other
>> tests for the first bit, but still, yuck).
> 
> Since Stolee is on the cc and has already seen me complaining about his
> test, I guess I should expand a bit. ;)

Ha. I apologize for causing pain here. My thought was that GIT_TEST_LONG=1
was only used by someone really willing to wait, or someone specifically
trying to investigate a problem that only triggers on very large cases.

In that sense, it's not so much intended as a frequently-run regression
test, but a "run this if you are messing with this area" kind of thing.
Perhaps there is a different pattern to use here?

> There are some small wins possible (e.g., using "commit --quiet" seems
> to shave off ~8s when we don't even think about writing a diff), but
> fundamentally the issue is that it just takes a long time to "git add"
> the 5.2GB worth of random data. I almost wonder if it would be worth it
> to hard-coded the known sha1 and sha256 names of the blobs, and write
> them straight into the appropriate loose object file. I guess that is
> tricky, though, because it actually needs to be a zlib stream, not just
> the output of "test-tool genrandom".
>
> Though speaking of which, another easy win might be setting
> core.compression to "0". We know the random data won't compress anyway,
> so there's no point in spending cycles on zlib.

The intention is mostly to expand the data beyond two gigabytes, so
dropping compression to get there seems like a good idea. If we are
not compressing at all, then perhaps we can reliably cut ourselves
closer to the 2GB limit instead of overshooting as a precaution.
 
> Doing this:
> 
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index d9e68bb2bf..849c6d1361 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -239,6 +239,8 @@ test_expect_success 'incremental-repack task' '
>  '
>  
>  test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
> +	test_config core.compression 0 &&
> +
>  	for i in $(test_seq 1 5)
>  	do
>  		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
> @@ -257,7 +259,7 @@ test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
>  		return 1
>  	done &&
>  	git add big &&
> -	git commit -m "Add big file (2)" &&
> +	git commit -qm "Add big file (2)" &&
>  
>  	# ensure any possible loose objects are in a pack-file
>  	git maintenance run --task=loose-objects &&
> 
> seems to shave off ~140s from the test. I think we could get a little
> more by cleaning up the enormous objects, too (they end up causing the
> subsequent test to run slower, too, though perhaps it was intentional to
> impact downstream tests).

Cutting out 70% out seems like a great idea. I don't think it was super
intentional to slow down those tests.

Thanks,
-Stolee


  reply	other threads:[~2020-12-01 20:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13  5:06 [PATCH 0/5] handling 4GB .idx files Jeff King
2020-11-13  5:06 ` [PATCH 1/5] compute pack .idx byte offsets using size_t Jeff King
2020-11-13  5:07 ` [PATCH 2/5] use size_t to store pack .idx byte offsets Jeff King
2020-11-13  5:07 ` [PATCH 3/5] fsck: correctly compute checksums on idx files larger than 4GB Jeff King
2020-11-13  5:07 ` [PATCH 4/5] block-sha1: take a size_t length parameter Jeff King
2020-11-13  5:07 ` [PATCH 5/5] packfile: detect overflow in .idx file size checks Jeff King
2020-11-13 11:02   ` Johannes Schindelin
2020-11-15 14:43 ` [PATCH 0/5] handling 4GB .idx files Thomas Braun
2020-11-16  4:10   ` Jeff King
2020-11-16 13:30     ` Derrick Stolee
2020-11-16 23:49       ` Jeff King
2020-11-30 22:57     ` Thomas Braun
2020-12-01 11:23       ` Jeff King
2020-12-01 11:39         ` t7900's new expensive test Jeff King
2020-12-01 20:55           ` Derrick Stolee [this message]
2020-12-02  2:47             ` [PATCH] t7900: speed up " Jeff King
2020-12-03 15:23               ` Derrick Stolee
2020-12-01 18:27         ` [PATCH 0/5] handling 4GB .idx files Taylor Blau
2020-12-02 13:12           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=373f3dfe-828b-430d-b88e-5e23302090cb@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).