git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] ls-files: support --recurse-submodules --stage
Date: Sat, 19 Feb 2022 04:11:30 +0100	[thread overview]
Message-ID: <220219.868ru7fsad.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20220218223212.1139366-1-jonathantanmy@google.com>


On Fri, Feb 18 2022, Jonathan Tan wrote:

> e77aa336f1 ("ls-files: optionally recurse into submodules", 2016-10-10)
> taught ls-files the --recurse-submodules argument, but only in a limited
> set of circumstances. In particular, --stage was unsupported, perhaps
> because there was no repo_find_unique_abbrev(), which was only
> introduced in 8bb95572b0 ("sha1-name.c: add
> repo_find_unique_abbrev_r()", 2019-04-16). This function is needed for
> using --recurse-submodules with --stage.
>
> Now that we have repo_find_unique_abbrev(), teach support for this
> combination of arguments.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> I got the similar-hashing object contents from Ævar's work in [1].

Hah! FWIW that was made by this script I hacked up at the time:
	
	#!/usr/bin/env perl
	use v5.32.0;
	use strict;
	use warnings;
	use Digest::SHA qw(sha1_hex sha256_hex);
	
	# Usage:
	## prefix= type=bad git find-colliding-hashes | tee garbage-coll-bad.txt
	## prefix= type=bad want=bad git find-colliding-hashes | tee garbage-coll-bad.txt
	
	$| = 1;
	my $s = $ENV{s} // "s";
	my %seen;
	my $type = $ENV{type} // "blob";
	my $prefix = $ENV{prefix} // "";
	my $want = $ENV{want} // "";
	while ($s++) {
		my $str = $prefix . $s;
		my $l = length($str) + 1;
		my $p = "$type $l\0$str\n";
		my $o = sha1_hex($p);
		next if length $want && index($o, $want) != 0;
		my $n = sha256_hex($p);
		my $os = substr($o, 0, 4);
		my $ns = substr($n, 0, 4);
		if ($os eq $ns) {
			say "hash($str) = [$os, $ns]" . ($seen{$os} ? " SEEN" : "");
			$seen{$os} = 1;
		}
	}

https://gist.github.com/avar/9e4c2bde7fbdc888b031713065a9eaf6 has some
more colliding blob prefixes, which I generated until I got bored with
it...

> +test_expect_success '--stage' '
> +	# In order to test hash abbreviation, write two objects that have the
> +	# same first 4 hexadecimal characters in their (SHA-1) hashes.
> +	echo brocdnra >submodule/c &&
> +	git -C submodule commit -am "update c" &&
> +	echo brigddsv >submodule/c &&
> +	git -C submodule commit -am "update c again" &&
> +
> +	cat >expect <<-\EOF &&
> +	100644 6da7 0	.gitmodules
> +	100644 7898 0	a
> +	100644 6178 0	b/b
> +	100644 dead9 0	submodule/c
> +	EOF

This test though will break, as you can see with:

    GIT_TEST_DEFAULT_HASH=sha256 ./t3007-ls-files-recurse-submodules.sh

So you'll need at least something like:

diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index 3d2da360d17..0fe69da8dcf 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -42,10 +42,10 @@ test_expect_success '--stage' '
 	echo brigddsv >submodule/c &&
 	git -C submodule commit -am "update c again" &&
 
-	cat >expect <<-\EOF &&
-	100644 6da7 0	.gitmodules
-	100644 7898 0	a
-	100644 6178 0	b/b
+	cat >expect <<-EOF &&
+	100644 $(git rev-parse --short=4 HEAD:.gitmodules) 0	.gitmodules
+	100644 $(git rev-parse --short=4 HEAD:a) 0	a
+	100644 $(git rev-parse --short=4 HEAD:b/b) 0	b/b
 	100644 dead9 0	submodule/c
 	EOF
 
But then the problem is that one is dead9 and the other dead6, I was
just trying to find 4-char prefixes.

But having indulged in all that, I'm now entirely confused about why any
of this needs to be tested here.

You're adding --stage, which will give us --stage-y output, and it was
previously incompatible with --recurse-submodules. Having the two
combine is good!

But why do we need to test the OID abbreviation at all, isn't that a bit
too much paranoia? Isn't it sufficient just do:

    opts="--stage --abbrev=4" &&
    git -C submodule ls-files $opts >expect &&
    git ls-files --recurse-submodules $opts --stage >raw &&
    grep submodule raw >actual &&
    test_cmp expect actual

Or well, then the path won't be the same, but I think you get the
idea.

I.e. don't we just want to test that the submodule is indeed included
here, not that some particular feature works in combination with it.

Supposing that repo_find_unique_abbrev() won't work might be a bit too
much paranoia, and I'm more test-happy than most :)

I'd think that if we should test anything it would be more meaningful to
e.g. test the sort order of the returned entries.

Your test case won't disambiguate between index entries being returned
in sort order v.s. just "submodules at the end". Since "s" sorts after
0, a and b.

Presumably it does the former, but I'd think distinguishing those would
be one meaningful test of actual --recurse-submodules --stage
functionality.

  parent reply	other threads:[~2022-02-19  3:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-18 22:32 [PATCH] ls-files: support --recurse-submodules --stage Jonathan Tan
2022-02-19  0:33 ` Junio C Hamano
2022-02-19  3:11 ` Ævar Arnfjörð Bjarmason [this message]
2022-02-19  3:50   ` Taylor Blau
2022-02-21 18:19   ` Junio C Hamano
2022-02-21 18:51     ` Ævar Arnfjörð Bjarmason
2022-02-24  0:11       ` Jonathan Tan
2022-02-21  1:48 ` Junio C Hamano
2022-02-21  2:45   ` Taylor Blau
2022-02-24  0:23 ` [PATCH v2] " Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220219.868ru7fsad.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).