On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote: > But - and that's the main idea i'm writing here - changing the storage > keys does not mean you should drop your old hashes out. If you change > the git data structure in a way, that it can keep multiple hashes for > the same "link" in each objects (trees, commits, etc) you can keep the > old ones right next to the new one. If you want to look up the > referenced object, you must use the newest hash - which is the key. > But if you want to verify some old hash, it's still possible! Just > look up the objects by the new key, remove all the newer generation > keys, and verify the old hash on that. > > A storage structure like this would allow a very great flexibility: > - You can change your hash algorithm in the future. If SHA-256 > becomes broken, it's not a problem. Just re-hash the storage, and > append the new hashes the git objects. > - You can still verify your old hashes after a hash change - removing > the new hashes from the objects before hashing should give you back > the old objects, thus giving you the same hash as before. > - That makes possible for signed tags, and commits to keep their > validity after hash change! With a clever-enough new format, you can > even keep the validity of current hashes and signs. (To be able to do > that, you should be able to calculate back the current format from the > new format.) > > Moving git forward to a format like this would solve the weak-key > problem in git forever. You would be able to configure your key algo > on a per repository basis, you - and git - can do the daily work on > the newest hashes, while still carrying the old hashes and signatures, > in case you ever want to verify them. That would allow repositories to > gracefully change hashes in case they need to, and to only > compatibility limitation is that you must use a new enough git to > understand the new storage format. > > What are your thoughts on this approach? Will git ever reach a release > with exchangeable hash algorithm? Or should someone look for > alternatives if there's a need for cryptographic security? I'm working on adding new hash algorithm support in Git. However, it requires a significant refactor of the code base. My current plan is not to implement side-by-side data, for a couple reasons. One is that it requires significantly more work to implement and complicates the code. It's also incompatible with all the refactoring I've done already. The second is that it requires that Git have the ability to store multiple hashes at once, which is very expensive in terms of memory. Moving from a 160-bit hash to a 256-bit hash (my current plan is SHA3-256) requires 1.6× the memory. Storing both requires 2.6× the memory. If you add a third hash, it requires even more. Memory is often a constraint with using Git. The current plan is to use git-fast-import and git-fast-export to handle that conversion process, and then maybe provide wrappers to make it more transparent. Currently the process of the refactor is ongoing, but it is a free time activity for me. If you'd like to follow the progress roughly, you can do so by checking the output of the following commands: git grep 'unsigned char.*20' | wc -l git grep 'struct object_id' | wc -l You are also welcome to contribute, of course. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204