git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFD PATCH 0/2] How to migrate our test infrastructure away from sha1
@ 2017-07-28 17:18 Stefan Beller
  2017-07-28 17:18 ` [PATCH 1/2] alter hash function: expose test dependencies on sha1 Stefan Beller
  2017-07-28 17:18 ` [PATCH 2/2] t6500: mark tests as SHA1 reliant Stefan Beller
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Beller @ 2017-07-28 17:18 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Brian got the ball rolling with his object-id patches
to fixup the code base, but we also need to discuss how
we approach our test infrastructure eventually.

C.f.:
* jb/t8008-cleanup (2017-07-26) 1 commit
 - t8008: rely on rev-parse'd HEAD instead of sha1 value
* sb/t4005-modernize (Remove hard coded sha1 values.)

This presents an alternative approach in the second patch,
which just marks tests as "SHA1" required instead of making
the test hash-agnostic.

The first patch is just an aid to show which tests need
addressing, although there are some false positives (i.e.
the test fails for other reasons than the hash dependency,
such as corruption of data.)

Stefan 

Stefan Beller (2):
  alter hash function: expose test dependencies on sha1
  t6500: mark tests as SHA1 reliant

 sha1dc/sha1.c | 2 +-
 t/t6500-gc.sh | 6 +++---
 t/test-lib.sh | 4 ++++
 3 files changed, 8 insertions(+), 4 deletions(-)

-- 
2.14.0.rc0.3.g6c2e499285


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] alter hash function: expose test dependencies on sha1
  2017-07-28 17:18 [RFD PATCH 0/2] How to migrate our test infrastructure away from sha1 Stefan Beller
@ 2017-07-28 17:18 ` Stefan Beller
  2017-07-28 21:52   ` Junio C Hamano
  2017-07-28 17:18 ` [PATCH 2/2] t6500: mark tests as SHA1 reliant Stefan Beller
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2017-07-28 17:18 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

DO NOT APPLY.

Alter the hash function such that with this patch
any dependency on sha1 in tests will make the test
fail. This patch applied on master yields this list:

./t0000-basic.sh
./t0002-gitfile.sh
./t0005-signals.sh
./t0013-sha1dc.sh
./t0021-conversion.sh
./t0090-cache-tree.sh
./t1001-read-tree-m-2way.sh
./t1007-hash-object.sh
./t1011-read-tree-sparse-checkout.sh
./t1013-read-tree-submodule.sh
./t1100-commit-tree-options.sh
./t1200-tutorial.sh
./t1300-repo-config.sh
./t1304-default-acl.sh
./t1400-update-ref.sh
./t1411-reflog-show.sh
./t1450-fsck.sh
./t1507-rev-parse-upstream.sh
./t1512-rev-parse-disambiguation.sh
./t1700-split-index.sh
./t2011-checkout-invalid-head.sh
./t2013-checkout-submodule.sh
./t2015-checkout-unborn.sh
./t2017-checkout-orphan.sh
./t2022-checkout-paths.sh
./t2101-update-index-reupdate.sh
./t2107-update-index-basic.sh
./t2203-add-intent.sh
./t3033-merge-toplevel.sh
./t3102-ls-tree-wildcards.sh
./t3103-ls-tree-misc.sh
./t3201-branch-contains.sh
./t3301-notes.sh
./t3305-notes-fanout.sh
./t3306-notes-prune.sh
./t3308-notes-merge.sh
./t3309-notes-merge-auto-resolve.sh
./t3310-notes-merge-manual-resolve.sh
./t3311-notes-merge-fanout.sh
./t3400-rebase.sh
./t3404-rebase-interactive.sh
./t3405-rebase-malformed.sh
./t3408-rebase-multi-line.sh
./t3415-rebase-autosquash.sh
./t3419-rebase-patch-id.sh
./t3421-rebase-topology-linear.sh
./t3501-revert-cherry-pick.sh
./t3502-cherry-pick-merge.sh
./t3503-cherry-pick-root.sh
./t3506-cherry-pick-ff.sh
./t3509-cherry-pick-merge-df.sh
./t3600-rm.sh
./t3700-add.sh
./t3701-add-interactive.sh
./t3702-add-edit.sh
./t3903-stash.sh
./t3905-stash-include-untracked.sh
./t4002-diff-basic.sh
./t4007-rename-3.sh
./t4008-diff-break-rewrite.sh
./t4010-diff-pathspec.sh
./t4011-diff-symlink.sh
./t4013-diff-various.sh
./t4014-format-patch.sh
./t4015-diff-whitespace.sh
./t4020-diff-external.sh
./t4022-diff-rewrite.sh
./t4029-diff-trailing-space.sh
./t4030-diff-textconv.sh
./t4033-diff-patience.sh
./t4034-diff-words.sh
./t4039-diff-assume-unchanged.sh
./t4042-diff-textconv-caching.sh
./t4044-diff-index-unique-abbrev.sh
./t4045-diff-relative.sh
./t4048-diff-combined-binary.sh
./t4050-diff-histogram.sh
./t4052-stat-output.sh
./t4054-diff-bogus-tree.sh
./t4060-diff-submodule-option-diff-format.sh
./t4126-apply-empty.sh
./t4151-am-abort.sh
./t4202-log.sh
./t4205-log-pretty-formats.sh
./t4208-log-magic-pathspec.sh
./t4211-line-log.sh
./t4300-merge-tree.sh
./t5150-request-pull.sh
./t5300-pack-object.sh
./t5306-pack-nobase.sh
./t5308-pack-detect-duplicates.sh
./t5309-pack-delta-cycles.sh
./t5313-pack-bounds-checks.sh
./t5512-ls-remote.sh
./t5515-fetch-merge-logic.sh
./t5516-fetch-push.sh
./t5520-pull.sh
./t5521-pull-options.sh
./t6000-rev-list-misc.sh
./t6012-rev-list-simplify.sh
./t6020-merge-df.sh
./t6022-merge-rename.sh
./t6024-recursive-merge.sh
./t6030-bisect-porcelain.sh
./t6031-merge-filemode.sh
./t6035-merge-dir-to-symlink.sh
./t6300-for-each-ref.sh
./t6500-gc.sh
./t7003-filter-branch.sh
./t7012-skip-worktree-writing.sh
./t7063-status-untracked-cache.sh
./t7102-reset.sh
./t7106-reset-unborn-branch.sh
./t7112-reset-submodule.sh
./t7201-co.sh
./t7400-submodule-basic.sh
./t7506-status-submodule.sh
./t7507-commit-verbose.sh
./t7508-status.sh
./t7600-merge.sh
./t7607-merge-overwrite.sh
./t7609-merge-co-error-msgs.sh
./t8008-blame-formats.sh

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1dc/sha1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index 25eded1399..e18acee9ca 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -1756,7 +1756,7 @@ static void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
 void SHA1DCInit(SHA1_CTX* ctx)
 {
 	ctx->total = 0;
-	ctx->ihv[0] = 0x67452301;
+	ctx->ihv[0] = 0x07452301;
 	ctx->ihv[1] = 0xEFCDAB89;
 	ctx->ihv[2] = 0x98BADCFE;
 	ctx->ihv[3] = 0x10325476;
-- 
2.14.0.rc0.3.g6c2e499285


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-28 17:18 [RFD PATCH 0/2] How to migrate our test infrastructure away from sha1 Stefan Beller
  2017-07-28 17:18 ` [PATCH 1/2] alter hash function: expose test dependencies on sha1 Stefan Beller
@ 2017-07-28 17:18 ` Stefan Beller
  2017-07-28 17:59   ` Junio C Hamano
  2017-07-28 22:14   ` Junio C Hamano
  1 sibling, 2 replies; 15+ messages in thread
From: Stefan Beller @ 2017-07-28 17:18 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The first test marked relies on hard coded sha1:

	# We need to create two object whose sha1s start with 17
	# since this is what git gc counts.  As it happens, these
	# two blobs will do so.
	test_commit 263 &&
	test_commit 410 &&

The next two seem to rely on state from the first one, I did not
investigate.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t6500-gc.sh | 6 +++---
 t/test-lib.sh | 4 ++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 41b0be575d..3900baa01d 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -43,7 +43,7 @@ test_expect_success 'gc is not aborted due to a stale symref' '
 	)
 '
 
-test_expect_success 'auto gc with too many loose objects does not attempt to create bitmaps' '
+test_expect_success SHA1 'auto gc with too many loose objects does not attempt to create bitmaps' '
 	test_config gc.auto 3 &&
 	test_config gc.autodetach false &&
 	test_config pack.writebitmaps true &&
@@ -77,7 +77,7 @@ run_and_wait_for_auto_gc () {
 	doesnt_matter=$(git gc --auto 9>&1)
 }
 
-test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+test_expect_success SHA1 'background auto gc does not run if gc.log is present and recent but does if it is old' '
 	test_commit foo &&
 	test_commit bar &&
 	git repack &&
@@ -95,7 +95,7 @@ test_expect_success 'background auto gc does not run if gc.log is present and re
 	test_line_count = 1 packs
 '
 
-test_expect_success 'background auto gc respects lock for all operations' '
+test_expect_success SHA1 'background auto gc respects lock for all operations' '
 	# make sure we run a background auto-gc
 	test_commit make-pack &&
 	git repack &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1b6e53f78a..a5a54c6d4a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1127,6 +1127,10 @@ test_lazy_prereq JGIT '
 	type jgit
 '
 
+test_lazy_prereq SHA1 '
+	false
+'
+
 # SANITY is about "can you correctly predict what the filesystem would
 # do by only looking at the permission bits of the files and
 # directories?"  A typical example of !SANITY is running the test
-- 
2.14.0.rc0.3.g6c2e499285


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-28 17:18 ` [PATCH 2/2] t6500: mark tests as SHA1 reliant Stefan Beller
@ 2017-07-28 17:59   ` Junio C Hamano
  2017-07-28 21:43     ` Stefan Beller
  2017-07-28 22:14   ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-07-28 17:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

There is another one that is better suited for this demonstration patch,
whose sole point is about creating bunch of objects that prefix conflicts
with each other to test the abbreviation machinery.



On Fri, Jul 28, 2017 at 10:18 AM, Stefan Beller <sbeller@google.com> wrote:
> The first test marked relies on hard coded sha1:
>
>         # We need to create two object whose sha1s start with 17
>         # since this is what git gc counts.  As it happens, these
>         # two blobs will do so.
>         test_commit 263 &&
>         test_commit 410 &&
>
> The next two seem to rely on state from the first one, I did not
> investigate.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  t/t6500-gc.sh | 6 +++---
>  t/test-lib.sh | 4 ++++
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 41b0be575d..3900baa01d 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -43,7 +43,7 @@ test_expect_success 'gc is not aborted due to a stale symref' '
>         )
>  '
>
> -test_expect_success 'auto gc with too many loose objects does not attempt to create bitmaps' '
> +test_expect_success SHA1 'auto gc with too many loose objects does not attempt to create bitmaps' '
>         test_config gc.auto 3 &&
>         test_config gc.autodetach false &&
>         test_config pack.writebitmaps true &&
> @@ -77,7 +77,7 @@ run_and_wait_for_auto_gc () {
>         doesnt_matter=$(git gc --auto 9>&1)
>  }
>
> -test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
> +test_expect_success SHA1 'background auto gc does not run if gc.log is present and recent but does if it is old' '
>         test_commit foo &&
>         test_commit bar &&
>         git repack &&
> @@ -95,7 +95,7 @@ test_expect_success 'background auto gc does not run if gc.log is present and re
>         test_line_count = 1 packs
>  '
>
> -test_expect_success 'background auto gc respects lock for all operations' '
> +test_expect_success SHA1 'background auto gc respects lock for all operations' '
>         # make sure we run a background auto-gc
>         test_commit make-pack &&
>         git repack &&
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 1b6e53f78a..a5a54c6d4a 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1127,6 +1127,10 @@ test_lazy_prereq JGIT '
>         type jgit
>  '
>
> +test_lazy_prereq SHA1 '
> +       false
> +'
> +
>  # SANITY is about "can you correctly predict what the filesystem would
>  # do by only looking at the permission bits of the files and
>  # directories?"  A typical example of !SANITY is running the test
> --
> 2.14.0.rc0.3.g6c2e499285
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-28 17:59   ` Junio C Hamano
@ 2017-07-28 21:43     ` Stefan Beller
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Beller @ 2017-07-28 21:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Fri, Jul 28, 2017 at 10:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
> There is another one that is better suited for this demonstration patch,
> whose sole point is about creating bunch of objects that prefix conflicts
> with each other to test the abbreviation machinery.

'Another one' meaning in another test file? I'll go look for these
tests.

The first patch produces lots of false positives, so maybe I'll need
to work on that (is there another code path for sha1?).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] alter hash function: expose test dependencies on sha1
  2017-07-28 17:18 ` [PATCH 1/2] alter hash function: expose test dependencies on sha1 Stefan Beller
@ 2017-07-28 21:52   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-07-28 21:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> DO NOT APPLY.
>
> Alter the hash function such that with this patch
> any dependency on sha1 in tests will make the test
> fail. This patch applied on master yields this list:
>
> ./t0000-basic.sh
> ....
> ./t8008-blame-formats.sh
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sha1dc/sha1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Everybody knew and said that we need to make tests less dependent on
the exact hash and in such a way that the ones that currently do
would test the morally equivalent thing with a new hash before we
can go forward.  And everybody said that one way to start that is to
build a Git with different hash function and trying to run tests.

This is the first report of doing so, and the large list above is
the biggest contribution this message makes to the "let's wean
ourselves off of SHA-1" journey.

Thanks for starting this.

>
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 25eded1399..e18acee9ca 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -1756,7 +1756,7 @@ static void sha1_process(SHA1_CTX* ctx, const uint32_t block[16])
>  void SHA1DCInit(SHA1_CTX* ctx)
>  {
>  	ctx->total = 0;
> -	ctx->ihv[0] = 0x67452301;
> +	ctx->ihv[0] = 0x07452301;
>  	ctx->ihv[1] = 0xEFCDAB89;
>  	ctx->ihv[2] = 0x98BADCFE;
>  	ctx->ihv[3] = 0x10325476;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-28 17:18 ` [PATCH 2/2] t6500: mark tests as SHA1 reliant Stefan Beller
  2017-07-28 17:59   ` Junio C Hamano
@ 2017-07-28 22:14   ` Junio C Hamano
  2017-07-29 17:58     ` brian m. carlson
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-07-28 22:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The first test marked relies on hard coded sha1:
>
> 	# We need to create two object whose sha1s start with 17
> 	# since this is what git gc counts.  As it happens, these
> 	# two blobs will do so.
> 	test_commit 263 &&
> 	test_commit 410 &&
>
> The next two seem to rely on state from the first one, I did not
> investigate.

I am moderately negative on this approach, if it is meant to suggest
the final shape of our test suite patch 1/2 started.

This script may be a good example you can use to demonstrate a much
better approach.  As the above comment in the test shows, we want to
create two objects whose object names begin with "17", and running
test_commit with 263 and 410 at this point in the test was a way to
achieve that when Git uses SHA-1 as its hash.

When we use a hash different from SHA-1, the exact strings 263 and
410 may change, but we should be able to find two other strings that
has the same property (i.e. they results in objects that share the
prefix "17").  Perhaps a better way forward for this kind of test is
to parameterize these hardcoded constants and make it easier to use
different values without having to change the rest of the script
when we switch the hash function?  So perhaps have something like

	case "$GIT_HASH_FUNCTION" in
	SHA-1)	
		TEST_17_1="263 410" ;;
	CORRUPT-SHA-1)	
		TEST_17_1="something else" ;;
        esac

near the top of the script and update the above two with:

	for token in $TEST_17_1
	do
		test_commit "$token" || return 1
	done &&

would prepare us to switch to SHA-256 or whatever hash function we
choose to use in the future?

t1512 is another example with full of such tests that we would want
to keep moral equivalents in the world with a different hash,
instead of reducing the test coverage.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-28 22:14   ` Junio C Hamano
@ 2017-07-29 17:58     ` brian m. carlson
  2017-07-30 21:21       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2017-07-29 17:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

[-- Attachment #1: Type: text/plain, Size: 2297 bytes --]

On Fri, Jul 28, 2017 at 03:14:49PM -0700, Junio C Hamano wrote:
> Stefan Beller <sbeller@google.com> writes:
> 
> > The first test marked relies on hard coded sha1:
> >
> > 	# We need to create two object whose sha1s start with 17
> > 	# since this is what git gc counts.  As it happens, these
> > 	# two blobs will do so.
> > 	test_commit 263 &&
> > 	test_commit 410 &&
> >
> > The next two seem to rely on state from the first one, I did not
> > investigate.
> 
> I am moderately negative on this approach, if it is meant to suggest
> the final shape of our test suite patch 1/2 started.
> 
> This script may be a good example you can use to demonstrate a much
> better approach.  As the above comment in the test shows, we want to
> create two objects whose object names begin with "17", and running
> test_commit with 263 and 410 at this point in the test was a way to
> achieve that when Git uses SHA-1 as its hash.
> 
> When we use a hash different from SHA-1, the exact strings 263 and
> 410 may change, but we should be able to find two other strings that
> has the same property (i.e. they results in objects that share the
> prefix "17").  Perhaps a better way forward for this kind of test is
> to parameterize these hardcoded constants and make it easier to use
> different values without having to change the rest of the script
> when we switch the hash function?  So perhaps have something like
> 
> 	case "$GIT_HASH_FUNCTION" in
> 	SHA-1)	
> 		TEST_17_1="263 410" ;;
> 	CORRUPT-SHA-1)	
> 		TEST_17_1="something else" ;;
>         esac

One approach I had considered taking is having a helper of some sort
that wrapped a simple key/value store.  We could pass the wrapper the
SHA-1 value (or, if necessary, an arbitrary key) and have it return the
proper value based on the given hash function.

That does have the downsides that the values may not present in the
tests themselves, and that people adding new tests will of course need
to run the test suite twice.  But it does make the tests easier to read.

Opinions on the desirability of this approach are of course welcome.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-29 17:58     ` brian m. carlson
@ 2017-07-30 21:21       ` Junio C Hamano
  2017-07-30 23:00         ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-07-30 21:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Stefan Beller, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> One approach I had considered taking is having a helper of some sort
> that wrapped a simple key/value store.  We could pass the wrapper the
> SHA-1 value (or, if necessary, an arbitrary key) and have it return the
> proper value based on the given hash function.
>
> That does have the downsides that the values may not present in the
> tests themselves, and that people adding new tests will of course need
> to run the test suite twice.  But it does make the tests easier to read.
>
> Opinions on the desirability of this approach are of course welcome.

I am not quite sure if I follow.  There was a proposal to tweak the
commit format that uses the new hash in such a way that we can tell
what SHA-1 would have been used if everything were SHA-1 (I think it
was from Jonathan, but I may be mistaken), and I recall that
generally the list were receptive to the idea.  But I have a feeling
that your "helper of some sort" is something else.

If your <key,value> is about letting us store something like

 - If you hash "hello\n" the resulting blob in SHA-1 world has this
   object name, and with that, you can find out the equivalent
   object name in SHA-256 world.

 - If you have a tree with the above blob at path P and nothing
   else, then the object name of that tree in the SHA-1 world and
   SHA-256 world are different and we can map between them.

 - Likewise for a commit that points at the above tree with fixed
   date, author and message.

I am not sure how much it would help.  Are you aiming to make it
easier and more structured to create a patch like what Stefan did
recently for t8008 in 0ba9c9a0 ("t8008: rely on rev-parse'd HEAD
instead of sha1 value", 2017-07-26)?

I also suspect that tests like t1512 and t6500 would not benefit
that much from such a mapping.  In these tests, the object names by
themselves are not interesting.  These tests are about what Git does
when the names of the objects involved in them happen to share a
certain prefix.  We are not interested in using the same payload in
these tests using different hash, which is likely to destroy the
aspect of the object names that these tests are interested in,
namely, they share the same prefix.  When updating these tests to
adjust for the SHA-256 world, we want to preserve that the resulting
object names happen to share the same prefix by tweaking the payload
strings (i.e. "263 and 410" in t6500 are chosen to cause the
resulting objects to share "17/" prefix and fall inside a same
fan-out directory as loose objects.  We want to choose different
strings so that the names of the resulting objects share the same
prefix, not necessarily "17/" but preferrably so, in the SHA-256
world.  Similarly, Random-looking strings like "a2onsxbvj" in t1512
are chosen to cause blobs, trees, commits and tags that are involved
in the test to all share the same prefix "000000..."; we want to
choose different set of such random-looking strings that cause all
objects involved to hash to the same prefix, not necessarily but
preferrably "000000...").



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-30 21:21       ` Junio C Hamano
@ 2017-07-30 23:00         ` brian m. carlson
  2017-07-30 23:24           ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2017-07-30 23:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

On Sun, Jul 30, 2017 at 02:21:50PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > One approach I had considered taking is having a helper of some sort
> > that wrapped a simple key/value store.  We could pass the wrapper the
> > SHA-1 value (or, if necessary, an arbitrary key) and have it return the
> > proper value based on the given hash function.
> >
> > That does have the downsides that the values may not present in the
> > tests themselves, and that people adding new tests will of course need
> > to run the test suite twice.  But it does make the tests easier to read.
> >
> > Opinions on the desirability of this approach are of course welcome.
> 
> I am not quite sure if I follow.  There was a proposal to tweak the
> commit format that uses the new hash in such a way that we can tell
> what SHA-1 would have been used if everything were SHA-1 (I think it
> was from Jonathan, but I may be mistaken), and I recall that
> generally the list were receptive to the idea.  But I have a feeling
> that your "helper of some sort" is something else.
> 
> If your <key,value> is about letting us store something like
> 
>  - If you hash "hello\n" the resulting blob in SHA-1 world has this
>    object name, and with that, you can find out the equivalent
>    object name in SHA-256 world.
> 
>  - If you have a tree with the above blob at path P and nothing
>    else, then the object name of that tree in the SHA-1 world and
>    SHA-256 world are different and we can map between them.
> 
>  - Likewise for a commit that points at the above tree with fixed
>    date, author and message.
> 
> I am not sure how much it would help.  Are you aiming to make it
> easier and more structured to create a patch like what Stefan did
> recently for t8008 in 0ba9c9a0 ("t8008: rely on rev-parse'd HEAD
> instead of sha1 value", 2017-07-26)?

Yes, basically, but a bit more generally.  There will always be cases in
which we need to specify an object ID or an arbitrary string and the
behavior will need to vary based on the hash.  That can be something
like, in this case, the two blob contents that would have the similar
prefix.

So in this case, we pass the helper the string "263 410" and get back a
value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
using.

This was basically a nicer way of wrapping the case statement that you
had given as an example.  Of course, it doesn't relieve us of doing the
hard work of analyzing the tests, which Stefan is doing, and with which
I don't want to interfere.

It was simply a proposal for a future direction which we could take if
we found ourselves needing to write a large number of hash-specific case
statements.  I'm happy to wait to actually implement that code until we
decide we need such a thing.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-30 23:00         ` brian m. carlson
@ 2017-07-30 23:24           ` brian m. carlson
  2017-07-31 20:17             ` Ævar Arnfjörð Bjarmason
  2017-07-31 20:26             ` Stefan Beller
  0 siblings, 2 replies; 15+ messages in thread
From: brian m. carlson @ 2017-07-30 23:24 UTC (permalink / raw)
  To: Junio C Hamano, Stefan Beller, git

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Sun, Jul 30, 2017 at 11:00:19PM +0000, brian m. carlson wrote:
> Yes, basically, but a bit more generally.  There will always be cases in
> which we need to specify an object ID or an arbitrary string and the
> behavior will need to vary based on the hash.  That can be something
> like, in this case, the two blob contents that would have the similar
> prefix.
> 
> So in this case, we pass the helper the string "263 410" and get back a
> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
> using.

I realize this was worded poorly.  So for my example, in this case, we'd
do:

test-helper-hash-string "263 410"

For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
as SHA-256 blobs, both start with "17" in their hex representation).
Presumably we'd read some environment variable to determine the proper
value.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-30 23:24           ` brian m. carlson
@ 2017-07-31 20:17             ` Ævar Arnfjörð Bjarmason
  2017-07-31 20:30               ` Stefan Beller
  2017-07-31 20:26             ` Stefan Beller
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-07-31 20:17 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, Stefan Beller, Git Mailing List

On Mon, Jul 31, 2017 at 1:24 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Sun, Jul 30, 2017 at 11:00:19PM +0000, brian m. carlson wrote:
>> Yes, basically, but a bit more generally.  There will always be cases in
>> which we need to specify an object ID or an arbitrary string and the
>> behavior will need to vary based on the hash.  That can be something
>> like, in this case, the two blob contents that would have the similar
>> prefix.
>>
>> So in this case, we pass the helper the string "263 410" and get back a
>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>> using.
>
> I realize this was worded poorly.  So for my example, in this case, we'd
> do:
>
> test-helper-hash-string "263 410"
>
> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> as SHA-256 blobs, both start with "17" in their hex representation).
> Presumably we'd read some environment variable to determine the proper
> value.

I've been mostly out of the loop on this hash transition plan, but
don't we expect to be compiling a git that knows about both SHA-1 and
whatever the $newhash is? If so it seems better to just test all N
hashes we have:

    test_expect_success_hash $desc_description '
        hash_value=$(test-helper-hash-string $CURRENT_HASH)
        ....
    '

Then test_expect_success_hash would run N times for the N hashes we have.

This would obviously be slightly more hassle to write & convert, but I
think it would be worth it, particularly with something like Travis
where we can test all hashes, instead of being in some mode where we
fragment on all of hashes/gettext poison and whatever other
compilation option we have that really requires compiling a new git
version...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-30 23:24           ` brian m. carlson
  2017-07-31 20:17             ` Ævar Arnfjörð Bjarmason
@ 2017-07-31 20:26             ` Stefan Beller
  2017-07-31 23:54               ` brian m. carlson
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2017-07-31 20:26 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, Stefan Beller,
	git@vger.kernel.org

On Sun, Jul 30, 2017 at 4:24 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Sun, Jul 30, 2017 at 11:00:19PM +0000, brian m. carlson wrote:
>> Yes, basically, but a bit more generally.  There will always be cases in
>> which we need to specify an object ID or an arbitrary string and the
>> behavior will need to vary based on the hash.  That can be something
>> like, in this case, the two blob contents that would have the similar
>> prefix.
>>
>> So in this case, we pass the helper the string "263 410" and get back a
>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>> using.
>
> I realize this was worded poorly.  So for my example, in this case, we'd
> do:
>
> test-helper-hash-string "263 410"
>
> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> as SHA-256 blobs, both start with "17" in their hex representation).
> Presumably we'd read some environment variable to determine the proper
> value.

This is what Junio proposed in the first message, except that we defer that
to a shell script as for each test we may need different things, so a helper may
be of little value?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-31 20:17             ` Ævar Arnfjörð Bjarmason
@ 2017-07-31 20:30               ` Stefan Beller
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Beller @ 2017-07-31 20:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: brian m. carlson, Junio C Hamano, Git Mailing List

On Mon, Jul 31, 2017 at 1:17 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Mon, Jul 31, 2017 at 1:24 AM, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
>> On Sun, Jul 30, 2017 at 11:00:19PM +0000, brian m. carlson wrote:
>>> Yes, basically, but a bit more generally.  There will always be cases in
>>> which we need to specify an object ID or an arbitrary string and the
>>> behavior will need to vary based on the hash.  That can be something
>>> like, in this case, the two blob contents that would have the similar
>>> prefix.
>>>
>>> So in this case, we pass the helper the string "263 410" and get back a
>>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>>> using.
>>
>> I realize this was worded poorly.  So for my example, in this case, we'd
>> do:
>>
>> test-helper-hash-string "263 410"
>>
>> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
>> as SHA-256 blobs, both start with "17" in their hex representation).
>> Presumably we'd read some environment variable to determine the proper
>> value.
>
> I've been mostly out of the loop on this hash transition plan, but
> don't we expect to be compiling a git that knows about both SHA-1 and
> whatever the $newhash is?

That is my understanding as well.

> If so it seems better to just test all N
> hashes we have:
>
>     test_expect_success_hash $desc_description '
>         hash_value=$(test-helper-hash-string $CURRENT_HASH)
>         ....
>     '
>
> Then test_expect_success_hash would run N times for the N hashes we have.

I think that is just adding more workload without furthering the stated goal
which is usually reached with just one hash function. The tests we're talking
about here are not trying to test correctness of hashes but some other
functionality
(correct abbreviation length, collisions in prefix, etc.) that would not change
depending on the hash function used, I imagine.

For t0000 we want to have multiple versions, one for each hash.

> This would obviously be slightly more hassle to write & convert, but I
> think it would be worth it, particularly with something like Travis
> where we can test all hashes, instead of being in some mode where we
> fragment on all of hashes/gettext poison and whatever other
> compilation option we have that really requires compiling a new git
> version...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant
  2017-07-31 20:26             ` Stefan Beller
@ 2017-07-31 23:54               ` brian m. carlson
  0 siblings, 0 replies; 15+ messages in thread
From: brian m. carlson @ 2017-07-31 23:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On Mon, Jul 31, 2017 at 01:26:40PM -0700, Stefan Beller wrote:
> On Sun, Jul 30, 2017 at 4:24 PM, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > I realize this was worded poorly.  So for my example, in this case, we'd
> > do:
> >
> > test-helper-hash-string "263 410"
> >
> > For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> > as SHA-256 blobs, both start with "17" in their hex representation).
> > Presumably we'd read some environment variable to determine the proper
> > value.
> 
> This is what Junio proposed in the first message, except that we defer that
> to a shell script as for each test we may need different things, so a helper may
> be of little value?

I think a shell script may end up being a fine helper for our needs.  In
any case, I think we're in violent agreement.  My proposal was just an
idea I had considered when thinking about how to do this, and any
suitable solution is fine with me.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-07-31 23:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28 17:18 [RFD PATCH 0/2] How to migrate our test infrastructure away from sha1 Stefan Beller
2017-07-28 17:18 ` [PATCH 1/2] alter hash function: expose test dependencies on sha1 Stefan Beller
2017-07-28 21:52   ` Junio C Hamano
2017-07-28 17:18 ` [PATCH 2/2] t6500: mark tests as SHA1 reliant Stefan Beller
2017-07-28 17:59   ` Junio C Hamano
2017-07-28 21:43     ` Stefan Beller
2017-07-28 22:14   ` Junio C Hamano
2017-07-29 17:58     ` brian m. carlson
2017-07-30 21:21       ` Junio C Hamano
2017-07-30 23:00         ` brian m. carlson
2017-07-30 23:24           ` brian m. carlson
2017-07-31 20:17             ` Ævar Arnfjörð Bjarmason
2017-07-31 20:30               ` Stefan Beller
2017-07-31 20:26             ` Stefan Beller
2017-07-31 23:54               ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).