git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Derrick Stolee <dstolee@microsoft.com>, git@vger.kernel.org
Subject: Re: [PATCH 0/5] handling 4GB .idx files
Date: Tue, 1 Dec 2020 06:23:28 -0500	[thread overview]
Message-ID: <X8YnsGsUl53OKFno@coredump.intra.peff.net> (raw)
In-Reply-To: <1403797985.37893.1606777048311@ox.hosteurope.de>

On Mon, Nov 30, 2020 at 11:57:27PM +0100, Thomas Braun wrote:

> Below is what I came up with. It passes here. I've replaced awk with
> cut from the original draft, and also moved the perl script out of the
> test as I think the quoting is getting way too messy otherwise. And
> I've added --no-dangling to git fsck as otherwise it takes forever to
> output the obvious dangling blobs. The unpack limit is mostly for
> testing the test itself with a smaller amount of blobs. But I still
> think it is worthwile to force everything into a pack.

I think you can get rid of some of the quoting by using perl directly as
the interpreter, rather than a shell script that only invokes it with
-e. See below.

> --- a/t/t1600-index.sh
> +++ b/t/t1600-index.sh

I don't think this should go in t1600; that's about testing the
.git/index file, not a pack .idx. Probably t5302 would be more
appropriate.

> @@ -97,4 +97,34 @@ test_expect_success 'index version config precedence' '
>  	test_index_version 0 true 2 2
>  '
>  
> +{
> +	echo "#!$SHELL_PATH"
> +	cat <<'EOF'
> +	   "$PERL_PATH" -e '
> +		for (0..154_000_000) {
> +			print "blob\n";
> +			print "data <<EOF\n";
> +			print "$_\n";
> +			print "EOF\n";
> +		} '
> +EOF
> +
> +} >dump
> +chmod +x dump

You can simplify this a bit with write_script, as well. And we do prefer
to put this stuff in a test block, so verbosity, etc, is handled
correctly.

I didn't let it run to completion, but something like this seems to
work:

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index 6d83aaf8a4..a4c1dc0f0a 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -97,23 +97,16 @@ test_expect_success 'index version config precedence' '
 	test_index_version 0 true 2 2
 '
 
-{
-	echo "#!$SHELL_PATH"
-	cat <<'EOF'
-	   "$PERL_PATH" -e '
-		for (0..154_000_000) {
-			print "blob\n";
-			print "data <<EOF\n";
-			print "$_\n";
-			print "EOF\n";
-		} '
-EOF
-
-} >dump
-chmod +x dump
-
 test_expect_success EXPENSIVE,PERL 'Test 4GB boundary for the index' '
 	test_config fastimport.unpacklimit 0 &&
+	write_script dump "$PERL_PATH" <<-\EOF &&
+	for (0..154_000_000) {
+		print "blob\n";
+		print "data <<EOF\n";
+		print "$_\n";
+		print "EOF\n";
+	}
+	EOF
 	./dump | git fast-import &&
 	blob=$(echo 0 | git hash-object --stdin) &&
 	git cat-file blob $blob >actual &&

> +test_expect_success EXPENSIVE,PERL 'Test 4GB boundary for the index' '

You can drop the PERL prereq. Even without it set, we assume that we can
do basic perl one-liners that would work even in old versions of perl.

I'm not sure if EXPENSIVE is the right ballpark, or if we'd want a
VERY_EXPENSIVE. On my machine, the whole test suite for v2.29.0 takes 64
seconds to run, and setting GIT_TEST_LONG=1 bumps that to 103s. It got a
bit worse since then, as t7900 adds an EXPENSIVE test that takes ~200s
(it's not strictly additive, since we can work in parallel on other
tests for the first bit, but still, yuck).

So we're looking at 2-3x to run the expensive tests now. This new one
would be 20x or more. I'm not sure if anybody would care or not (i.e.,
whether anyone actually runs the whole suite with this flag). I thought
we did for some CI job, but it looks like it's just the one-off in
t5608.

> +	git cat-file blob $final &&
> +	git cat-file blob fffffff &&

This final cat-file may be a problem when tested with SHA-256. You are
relying on the fact that there is exactly one object that matches seven
f's as its prefix. That may be true for SHA-1, but if so it's mostly
luck.  Seven hex digits is only 28 bits, which is ~260M. For 154M
objects, we'd expect an average of 0.57 objects per 7-digit prefix. So I
wouldn't be at all surprised if there are two of them for SHA-256.

I'm also not sure what it's testing that the $final one isn't.

-Peff

  reply	other threads:[~2020-12-01 11:25 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13  5:06 Jeff King
2020-11-13  5:06 ` [PATCH 1/5] compute pack .idx byte offsets using size_t Jeff King
2020-11-13  5:07 ` [PATCH 2/5] use size_t to store pack .idx byte offsets Jeff King
2020-11-13  5:07 ` [PATCH 3/5] fsck: correctly compute checksums on idx files larger than 4GB Jeff King
2020-11-13  5:07 ` [PATCH 4/5] block-sha1: take a size_t length parameter Jeff King
2020-11-13  5:07 ` [PATCH 5/5] packfile: detect overflow in .idx file size checks Jeff King
2020-11-13 11:02   ` Johannes Schindelin
2020-11-15 14:43 ` [PATCH 0/5] handling 4GB .idx files Thomas Braun
2020-11-16  4:10   ` Jeff King
2020-11-16 13:30     ` Derrick Stolee
2020-11-16 23:49       ` Jeff King
2020-11-30 22:57     ` Thomas Braun
2020-12-01 11:23       ` Jeff King [this message]
2020-12-01 11:39         ` t7900's new expensive test Jeff King
2020-12-01 20:55           ` Derrick Stolee
2020-12-02  2:47             ` [PATCH] t7900: speed up " Jeff King
2020-12-03 15:23               ` Derrick Stolee
2020-12-01 18:27         ` [PATCH 0/5] handling 4GB .idx files Taylor Blau
2020-12-02 13:12           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X8YnsGsUl53OKFno@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=thomas.braun@virtuell-zuhause.de \
    --subject='Re: [PATCH 0/5] handling 4GB .idx files' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).