On Sun, Mar 05, 2017 at 01:45:46PM +0000, Ian Jackson wrote: > brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"): > > Instead, I was referring to areas like the notes code. It has extensive > > use of the last byte as a type of lookup table key. It's very dependent > > on having exactly one hash, since it will always want to use the last > > byte. > > You mean note_tree_search ? (My tree here may be a bit out of date.) > This doesn't seem difficult to fix. The nontrivial changes would be > mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE. > > It's true that like most of git there's a lot of hardcoded `sha1'. I'm talking about the entire notes.c file. There are several different uses of "19" in there, and they compose at least two separate concepts. My object-id-part9 series tries to split those out into logical constants. This code is not going to handle repositories with different-length objects well, which I believe was your initial proposal. I originally thought that mixed-hash repositories would be viable as well, but I no longer do. > Are you arguing in favour of "replace git with git2 by simply > s/20/64/g; s/sha1/blake/g" ? This seems to me to be a poor idea. > Takeup of the new `git2' would be very slow because of the pain > involved. I'm arguing that the same binary ought to be able to handle both SHA-1 and the new hash. I'm also arguing that a given object have exactly one hash and that we not mix hashes in the same object. A repository will be composed of one type of object, and if that's the new hash, a lookup table will be used to translate SHA-1. We can synthesize the old objects, should we need them. That allows people to use the SHA-1 hashes (in my view, with a prefix, such as "sha1:") in repositories using the new hash. It also allows verifying old tags and commits if need be. What I *would* like to see is an extension to the tag and commit objects which names the hash that was used to make them. That makes it easy to determine which object the signature should be verified over, as it will verify over only one of them. > [1] I've heard suggestions here that instead we should expect users to > "git1 fast-export", which you would presumably feed into "git2 > fast-import". But what is `git1' here ? Is it the current git > codebase frozen in time ? I don't think it can be. With this > conversion strategy, we will need to maintain git1 for decades. It > will need portability fixes, security fixes, fixes for new hostile > compiler optimisations, and so on. The difficulty of conversion means > there will be pressure to backport new features from `git2' to `git1'. > (Also this approach means that all signatures are definitively lost > during the conversion process.) I'm proposing we have a git hash-convert (the name doesn't matter that much) that converts in place. It rebuilds the objects and builds a lookup table. Since the contents of git objects are deterministic, this makes it possible for each individual user to make the transition in place. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204