On Thu, Apr 26, 2018 at 05:46:28PM +0200, Duy Nguyen wrote: > On Wed, Apr 25, 2018 at 8:49 PM, Martin Ă…gren wrote: > > Once that is accomplished, I sort of suspect that this code will want to > > be updated to not always blindly use the_hash_algo, but to always work > > with SHA-1 sizes. Or rather, this would turn into more generic code to > > handle both "v2 with SHA-1" and "v3 with some hash function(s)". This > > commit might be a good first step in that direction. > > I also have an uneasy feeling when things this close to on-disk file > format get hash-agnostic treatment. I think we would need to start > adding new file formats soon, from bottom up with simple things like > loose object files (cat-file and hash-object should be enough to test > blobs...), then moving up to pack files and more. This is when we can > really decide where to use the new hash and whether we should keep > some hashes as sha-1. I agree that this is work which needs to be done soon. There are basically a couple of pieces we need to handle NewHash: * Remove the dependencies on SHA-1 as much as possible. * Get the tests to pass with a different hash (almost done for 160-bit hash; in progress for 256-bit hashes). * Write pack code. * Write loose object index code. * Write read-as-SHA-1 code. * Force the codebase to always use SHA-1 when dealing with fetch/push. * Distinguish between code which needs to use compatObjectFormat and code which needs to use objectFormat. * Decide on NewHash. I'm working on the top two bullet points right now. Others are welcome to pick up other pieces, or I'll get to them eventually. As much as I'm dreading having the bikeshedding discussion over what we're going to pick for NewHash, some of these pieces require knowing what algorithm it will be. For example, we have some tests which either need to be completely rewritten or have a translation table written for them (think the ones that use colliding short names). In order for those tests to have the translation table written, we need to be able to compute colliding values. I'm annotating these with prerequisites, but there are quite a few tests which are skipped. I expect writing the pack, loose object index, and read-as-SHA-1 code is going to require having some code for NewHash or stand-in present in order for it to compile and be tested. It's possible that others could come up with more imaginative solutions that don't require that, but I have my doubts. > For trailing hashes for example, there's no need to move to a new hash > which only costs us more cycles. We just use it as a fancy checksum to > avoid bit flips. But then my assumption about cost may be completely > wrong without experimenting. I would argue that consistency is helpful. Also, do we really want people to be able to (eventually) create colliding packs that contain different data? That doesn't seem like a good idea. But also, some of the candidates we're considering for NewHash are actually faster than SHA-1. So for performance reasons alone, it might be useful to adopt a consistent scheme. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204