On Sat, Jul 21, 2018 at 09:52:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jul 20 2018, brian m. carlson wrote:
> > I know this discussion has sort of petered out, but I'd like to see if
> > we can revive it.  I'm writing index v3 and having a decision would help
> > me write tests for it.
> >
> > To summarize the discussion that's been had in addition to the above,
> > Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> > over SHA-256 over SHA3-256, although any of them would be fine.
> >
> > Are there other contributors who have a strong opinion?  Are there
> > things I can do to help us coalesce around an option?
> 
> I have a vague recollection of suggesting something similar in the past,
> but can't find that E-Mail (and maybe it never happened), but for
> testing purposes isn't in simplest if we just have some "test SHA-1"
> algorithm where we pretent that all inputs like "STRING" are really
> "PREFIX-STRING" for the purposes of hashing, or fake shortening /
> lengthening the hash to test arbitrary lenghts of N (just by repeating
> the hash from the beginning is probably good enough...).
> 
> That would make such patches easier to review, since we wouldn't need to
> carry hundreds/thousands of lines of dense hashing code, but a more
> trivial wrapper around SHA-1, and we could have some test mode where we
> could compile & run tests with an arbitrary hash length to make sure
> everything's future proof even after we move to NewHash.

I think Stefan suggested this approach.  It is viable for testing some
aspects of the code, but not others.  It doesn't work for synthesizing
partial collisions or the bisect tests (since bisect falls back to
object ID as a disambiguator).

I had tried this approach (using a single zero-byte as a prefix), but
for whatever reason, it ended up producing inconsistent results when I
hashed.  I'm unclear what went wrong in that approach, but I finally
discarded it after spending an hour or two staring at it.  I'm not
opposed to someone else providing it as an option, though.

Also, after feedback from Eric Sunshine, I decided to adopt an approach
for my hash-independent tests series that used the name of the hash
within the tests so that we could support additional algorithms (such as
a pseudo-SHA-1).  That work necessarily involves having a name for the
hash, which is why I haven't revisited it.

As for arbitrary hash sizes, there is some code which necessarily needs
to depend on a fixed hash size.  A lot of our Perl code matches
[0-9a-f]{40}, which needs to change.  There's no reason we couldn't
adopt such testing in the future, but it might end up being more
complicated than we want.  I have strived to reduce the dependence on
fixed-size constants wherever possible, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204