On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > > [...] > > Goals > > ----- > > 1. The transition to SHA-256 can be done one local repository at a time. > > a. Requiring no action by any other party. > > b. A SHA-256 repository can communicate with SHA-1 Git servers > > (push/fetch). > > c. Users can use SHA-1 and SHA-256 identifiers for objects > > interchangeably (see "Object names on the command line", below). > > d. New signed objects make use of a stronger hash function than > > SHA-1 for their security guarantees. > > 2. Allow a complete transition away from SHA-1. > > a. Local metadata for SHA-1 compatibility can be removed from a > > repository if compatibility with SHA-1 is no longer needed. > > 3. Maintainability throughout the process. > > a. The object format is kept simple and consistent. > > b. Creation of a generalized repository conversion tool. > > > > Non-Goals > > --------- > > 1. Add SHA-256 support to Git protocol. This is valuable and the > > logical next step but it is out of scope for this initial design. > > This is a non-goal according to the docs, but now that we have protocol > v2 in git, perhaps we could start specifying or describing how this > protocol extension will work? I have code that does this. The reason is that the first stage of the transition code is to implement stage 4 of the transition: that is, a full SHA-256 implementation without any SHA-1 support. Implementing it that way means that we don't have to deal with any of the SHA-1 to SHA-256 mapping in the first stage of the code. In order to clone an SHA-256 repo (which the testsuite is completely broken without), you need to be able to have basic SHA-256 support in the protocol. I know this was a non-goal, but the alternative is a an inability to run the testsuite using SHA-256 until all the code is merged, which is unsuitable for development. The transition plan also anticipates stage 4 (full SHA-256) support before earlier stages, so this will be required. I hope to be able to spend some time documenting this in a little bit. I have documentation for that code in my branch, but I haven't sent it in yet. I realize I have a lot of code that has not been sent in yet, but I also tend to build on my own series a lot, and I probably need to be a bit better about extracting reusable pieces that can go in independently without waiting for the previous series to land. > > [...] > > 3. Intermixing objects using multiple hash functions in a single > > repository. > > But isn't that the goal now per "Translation table" & writing both SHA-1 > and SHA-256 versions of objects? No, I think this statement is basically that you have to have the entire repository use all one algorithm under the hood in the .git directory, translation tables excluded. I don't think that's controversial. > > [...] > > Pack index > > ~~~~~~~~~~ > > Pack index (.idx) files use a new v3 format that supports multiple > > hash functions. They have the following format (all integers are in > > network byte order): > > > > - A header appears at the beginning and consists of the following: > > - The 4-byte pack index signature: '\377t0c' > > - 4-byte version number: 3 > > - 4-byte length of the header section, including the signature and > > version number > > - 4-byte number of objects contained in the pack > > - 4-byte number of object formats in this pack index: 2 > > - For each object format: > > - 4-byte format identifier (e.g., 'sha1' for SHA-1) > > So, given that we have 4-byte limit and have decided on SHA-256 are we > just going to call this 'sha2'? That might be confusingly ambiguous > since SHA2 is a standard with more than just SHA-256, maybe 's256', or > maybe we should give this 8 bytes with trailing \0s so we can have > "SHA-1\0\0\0" and "SHA-256\0"? This is the format_version field in struct git_hash_algo. For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for SHA-256, I have 0x73323536, which is "s256", big-endian. The former is in the codebase already; the latter, in my hash-impl branch. If people have objections, we can change this up until we merge the pack index v3 code (which is not yet finished). It needs to be unique, and that's it. We could specify 0x00000001 and 0x00000002 if we wanted, although I feel the values I mentioned above are self-documenting, which is desirable. > > [...] > > - The trailer consists of the following: > > - A copy of the 20-byte SHA-256 checksum at the end of the > > corresponding packfile. > > > > - 20-byte SHA-256 checksum of all of the above. > > We need to update both of these to 32 byte, right? Or are we planning to > truncate the checksums? > > This seems like just a mistake when we did s/NewHash/SHA-256/g, but then > again it was originally "20-byte NewHash checksum" ever since 752414ae43 > ("technical doc: add a design doc for hash function transition", > 2017-09-27), so what do we mean here? Yes, this will be 32 bytes. The code I have uses 32 bytes, because truncating it means that we have to write special code just for that case, which seems silly. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204