Hi Ævar,

On Fri, 2 Jun 2017, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Jun 2, 2017 at 7:54 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> >
> > Johannes Schindelin wrote:
> >> On Thu, 1 Jun 2017, Stefan Beller wrote:
> >
> >>> We had a discussion off list how much of the test suite is in bad
> >>> shape, and "$ git grep ^index" points out a lot of places as well.
> >>
> >> Maybe we should call out a specific month (or even a longer period)
> >> during which we try to push toward that new hash function, and focus
> >> more on those tasks (and on critical bug fixes, if any) than anything
> >> else.
> >
> > Thanks for offering. ;-)
> >
> > Here's a rough list of some useful tasks, in no particular order:
> >
> > 1. bc/object-id: This patch series continues, eliminating assumptions
> >    about the size of object ids by encapsulating them in a struct.
> >    One straightforward way to find code that still needs to be
> >    converted is to grep for "sha" --- often the conversion patches
> >    change function and variable names to refer to oid_ where they used
> >    to use sha1_, making the stragglers easier to spot.
> >
> > 2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond
> >    t00* make assumptions about the exact values of object ids.  That's
> >    bad for maintainability for other reasons beyond the hash function
> >    transition, too.
> >
> >    It should be possible to suss them out by patching git's sha1
> >    routine to use the ones-complement of sha1 (~sha1) instead and
> >    seeing which tests fail.
> 
> I just hacked this up locally.

Would you mind turning that into a patch? I figure this could be a
compile-time switch and applied to Git's `master` so that it is easier to
find those assumptions, as well as verify fixes on multiple platforms.

> > 4. When choosing a hash function, people may argue about performance.
> >    It would be useful for run some benchmarks for git (running
> >    the test suite, t/perf tests, etc) using a variety of hash
> >    functions as input to such a discussion.
> 
> To the extent that such benchmarks matter, it seems prudent to heavily
> weigh them in favor of whatever seems to be likely to be the more
> common hash function going forward, since those are likely to get
> faster through future hardware acceleration.

Exactly.

As I just mentioned elsewhere, the cryptographers I talked to expect
SHA-256 and SHA3-256 to see the most focus of all hash functions, by far.

> E.g. Intel announced Goldmont last year which according to one SHA-1
> implementation improved from 9.5 cycles per byte to 2.7 cpb[1]. They
> only have acceleration for SHA-1 and SHA-256[2]
> 
> 1. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385
> 
> 2. https://en.wikipedia.org/wiki/Goldmont

Thanks for digging that up! Very valuable information.

Ciao,
Dscho