Hi Ævar, On Fri, 2 Jun 2017, Ævar Arnfjörð Bjarmason wrote: > On Fri, Jun 2, 2017 at 7:54 PM, Jonathan Nieder wrote: > > > > Johannes Schindelin wrote: > >> On Thu, 1 Jun 2017, Stefan Beller wrote: > > > >>> We had a discussion off list how much of the test suite is in bad > >>> shape, and "$ git grep ^index" points out a lot of places as well. > >> > >> Maybe we should call out a specific month (or even a longer period) > >> during which we try to push toward that new hash function, and focus > >> more on those tasks (and on critical bug fixes, if any) than anything > >> else. > > > > Thanks for offering. ;-) > > > > Here's a rough list of some useful tasks, in no particular order: > > > > 1. bc/object-id: This patch series continues, eliminating assumptions > > about the size of object ids by encapsulating them in a struct. > > One straightforward way to find code that still needs to be > > converted is to grep for "sha" --- often the conversion patches > > change function and variable names to refer to oid_ where they used > > to use sha1_, making the stragglers easier to spot. > > > > 2. Hard-coded object ids in tests: As Stefan hinted, many tests beyond > > t00* make assumptions about the exact values of object ids. That's > > bad for maintainability for other reasons beyond the hash function > > transition, too. > > > > It should be possible to suss them out by patching git's sha1 > > routine to use the ones-complement of sha1 (~sha1) instead and > > seeing which tests fail. > > I just hacked this up locally. Would you mind turning that into a patch? I figure this could be a compile-time switch and applied to Git's `master` so that it is easier to find those assumptions, as well as verify fixes on multiple platforms. > > 4. When choosing a hash function, people may argue about performance. > > It would be useful for run some benchmarks for git (running > > the test suite, t/perf tests, etc) using a variety of hash > > functions as input to such a discussion. > > To the extent that such benchmarks matter, it seems prudent to heavily > weigh them in favor of whatever seems to be likely to be the more > common hash function going forward, since those are likely to get > faster through future hardware acceleration. Exactly. As I just mentioned elsewhere, the cryptographers I talked to expect SHA-256 and SHA3-256 to see the most focus of all hash functions, by far. > E.g. Intel announced Goldmont last year which according to one SHA-1 > implementation improved from 9.5 cycles per byte to 2.7 cpb[1]. They > only have acceleration for SHA-1 and SHA-256[2] > > 1. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385 > > 2. https://en.wikipedia.org/wiki/Goldmont Thanks for digging that up! Very valuable information. Ciao, Dscho