* NewHash alternatives and SHA benchmarks @ 2019-12-22 6:41 Michael Clark 2019-12-23 1:13 ` brian m. carlson 0 siblings, 1 reply; 4+ messages in thread From: Michael Clark @ 2019-12-22 6:41 UTC (permalink / raw) To: git; +Cc: Michael Clark Hi Folks, I have become interested in git's NewHash migration and have been investigating alternate hash algorithm support in git. It seems that implementation of NewHash is still a way off, and it is still possible to revise the choice of SHA256 for a better alternative. Many of you may already be aware of Hash content Length extension attacks [1]. While I am not aware of any git vulnerabilities based on length extension; it seems to me that choosing a hash algorithm that is not vulnerable to this attack could prevent a future exploit based on some future protocol or other bug. git hashes should be final in its usage, so length extension resistance is a good property. SHA3 is not vulnerable to content length extension because it uses a sponge construction internally, meaning state is diffused away, and the state cannot be reconstructed from a hash value. Truncated SHA2 hashes such as SHA224 and SHA512/256 are also not vulnerable to length extension, however, SHA224 only reduces the probability of extension by 1/2^32 which, while not trivial, is a small reduction. SHA512/256 or SHA3-256 seems to me to be most attractive for safety against content length extension. With this in mind, I decided to investigate performance properties of the alternative SHA hash algorithms. Of particular interest to me is SHA3-256 and SHA512/256 (SHA512 truncated to 256-bits) as they are both the same size as SHA256 thus are good NewHash contenders. I was quite surprised by the benchmark results with SHA512/256 1.5X faster than SHA256. SHA512/256 seems a very good alternative to SHA256: - SHA256 8 x 32-bit state (256-bits) and 64-byte block, 64 rounds - SHA512 8 x 64-bit state (512-bits) and 128-byte block, 80 rounds SHA512/256 has understandable good performance on 64-bits systems due to having only slightly more iterations (80 instead of 64) while processing 128 bytes per block instead of 64. In practice, this means SHA512/256 is about 1.5X faster than SHA256 on 64-bit systems. SHA3, while competitive, is still significantly slower than SHA512. The AVX512 version of SHA3 is actually faster than SHA256. SHA256 is not the best performing NewHash alternative by a long shot. SHA Benchmarks on Intel Core i7-7980XE @ 4.6GHz =============================================== (MiB/sec)\ ext 16 128 1024 4096 65536 1048576 ----- ----- ----- ----- ----- ----- ----- ----- sha1 avx2 37.5 239.6 817.0 1097.5 1231.2 1244.8 sha224 avx2 30.8 170.5 439.3 524.3 560.2 564.4 sha256 avx2 31.2 171.2 438.7 525.0 560.9 564.8 sha512 avx2 27.1 167.2 561.5 747.1 835.7 842.8 sha512-224 avx2 26.8 164.7 558.8 747.9 836.5 845.1 sha512-256 avx2 26.6 163.9 557.9 751.4 835.0 843.0 sha3-224 avx2 38.9 313.8 411.5 474.6 491.1 490.7 sha3-256 avx2 39.7 317.3 414.5 445.3 464.9 464.3 sha3-384 avx2 39.8 184.6 338.1 348.4 356.5 358.0 sha3-512 avx2 40.0 185.0 228.4 245.9 248.0 249.0 (MiB/sec)\ ext 16 128 1024 4096 65536 1048576 ----- ----- ----- ----- ----- ----- ----- ----- sha3-224 avx512 24.3 192.6 506.5 674.3 736.8 740.9 sha3-256 avx512 25.0 198.4 511.5 635.4 698.8 698.8 sha3-384 avx512 24.8 152.3 429.6 503.9 535.7 540.2 sha3-512 avx512 24.8 152.4 307.3 361.2 375.6 374.4 Recommendations =============== After performing multiple benchmarks and thinking about this for several weeks I have come to the conclusion that SHA512/256 is a very good alternative to SHA256, both from the perspective of its length extension resistance, and performance. It seems that NewHash implementation is some way off so it is not too late to change. - Consider SHA512/256 for NewHash SHA algorithm patches ===================== I will be sending a patch series which adds SHA Algorithms which are in the sha-algorithms branch of my git tree. - https://github.com/michaeljclark/git.git - https://github.com/michaeljclark/git/tree/sha-algorithms/sha This is a precursor to asking some questions about the proposed implementation of multiple or compat hashs (N-hash or 2-hash?). I am curious about the pack format and directory structure for loose objects. It seems there needs to be a single hash chosen for loose object filenames, and auxilliary hashes will need hash to hash map to go from a compat hash to new hash. I have been reading the code, and it seems like it is going to be a challenge to change to support more than one hash concurrently. My observation is the content addressable storage layer could do with being a little more abstract, as there seems to be a lot of code that seperately computes hashes by accessing the hash algorithms directly versus going through an abstract content-addressable storage layer, like the VFS in Linux. In any case, I hope you find this benchmark data useful. Regards, Michael [1] https://en.wikipedia.org/wiki/Length_extension_attack ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NewHash alternatives and SHA benchmarks 2019-12-22 6:41 NewHash alternatives and SHA benchmarks Michael Clark @ 2019-12-23 1:13 ` brian m. carlson 2019-12-23 1:59 ` Michael Clark 0 siblings, 1 reply; 4+ messages in thread From: brian m. carlson @ 2019-12-23 1:13 UTC (permalink / raw) To: Michael Clark; +Cc: git [-- Attachment #1: Type: text/plain, Size: 6260 bytes --] On 2019-12-22 at 06:41:33, Michael Clark wrote: > Recommendations > =============== > > After performing multiple benchmarks and thinking about this for > several weeks I have come to the conclusion that SHA512/256 is a > very good alternative to SHA256, both from the perspective of its > length extension resistance, and performance. It seems that NewHash > implementation is some way off so it is not too late to change. > > - Consider SHA512/256 for NewHash This has been discussed extensively on the list. Git is not in general vulnerable to length extension attacks because it hashes the length of the object as part of the object header. This was documented as one of the known aspects in the transition plan in Documentation/technical/hash-function-transition.txt. We've considered a lot of hash algorithms, including BLAKE2b-256, SHA-256, SHA-512/256, SHA-3-256, SHAKE256 with a 256-bit length, and others. This was no doubt controversial, but we discussed the pros and cons of many options and this is what we decided. I will state that while our choice was not one of my more preferred options, I value that we have a consensus on the way forward. Among the reasons we chose SHA-256 is performance on newer x86-64 processors with the SHA extensions, which support only SHA-1 and SHA-256, and compatibility with a wide variety of cryptographic libraries. I have no desire for Git to provide assembly implementations of NewHash, and that means people must rely on existing cryptographic libraries. Cryptographic libraries overwhelmingly do not support SHA-512/256. While OpenSSL does, it's only in the latest versions, and people generally cannot legally distribute both Git and OpenSSL linked together, so they must rely on system libraries for good performance. Apple Common Crypto, for example, does not document any SHA-512/256 implementations. There is also a benefit to chaining using a single algorithm for signatures, which therefore necessitates using SHA-256 or one of the other original SHA-2 algorithms until the next version of OpenPGP is implemented which supports SHA-3. Because we decided some time ago, I've sent in a bunch of patches to our testsuite to make it work with SHA-256. Some of these patches are general, in that they make the tests generate values which are used, or they are specific to the length of the hash algorithm. Others use specific hash values, and changing the hash algorithm will require recomputing all of these values. Absent a compelling security reason to abandon SHA-256, such as a significant previously unknown cryptographic weakness, I don't plan to reimplement all of this work. Updating our testsuite to work successfully with SHA-256 has taken a huge amount of time, and this work has been entirely done on my own free time because I want the Git project to be successful. That doesn't even include the contributions of others who have reviewed, discussed, and contributed to the current work and transition plan. While I appreciate the benefits of alternatives to SHA-256, I'm a bit annoyed that you're proposing to throw away all of this work after we explicitly sought to build consensus about an algorithm so that this stage of the work could begin. Of course, nothing prevents you from proposing a new algorithm and fixing the testsuite to make it work, but I don't believe that we're likely to adopt a different algorithm at this point, given the previous discussion on the list. The code is designed to be pluggable to support arbitrary algorithms, but there are pieces which assume that the algorithm can be distinguished by length (e.g., the dumb HTTP protocol) and therefore you'd need to devise a different way to distinguish these versions. If you'd like to see what a Git binary that supports multiple hash algorithms looks like at the moment, you're welcome to look at my transition-stage-4 branch at https://github.com/bk2204/git.git so you can get a fuller understanding of what's involved. You can run the testsuite with that code and GIT_TEST_DEFAULT_HASH=<hashname> to see what passes and fails. > SHA algorithm patches > ===================== > > I will be sending a patch series which adds SHA Algorithms which > are in the sha-algorithms branch of my git tree. > > - https://github.com/michaeljclark/git.git > - https://github.com/michaeljclark/git/tree/sha-algorithms/sha > > This is a precursor to asking some questions about the proposed > implementation of multiple or compat hashs (N-hash or 2-hash?). > I am curious about the pack format and directory structure for > loose objects. It seems there needs to be a single hash chosen > for loose object filenames, and auxilliary hashes will need > hash to hash map to go from a compat hash to new hash. That's correct. There's documentation in the transition plan about what this will look like. A stage 4 implementation (where repositories are either SHA-1 only or SHA-256 only) has been written and I'll send it out once the remaining test fixes are merged. I should point out that we specifically chose not to support a plethora of algorithms because the goal is to encourage everybody to adopt a single, secure algorithm. The network protocol will need to interoperate using a single algorithm, which means the repository will need to contain a mapping for that algorithm. Since the current plan is to support no more than two algorithms at a time, one of which will need to be SHA-1, allowing users to pick and choose from a variety of algorithms harms interoperability. Seamless interoperability between SHA-1 and NewHash repositories was considered extremely important in the transition plan. That won't be coming immediately with the stage 4 implementation, but it is planned for the future. My hope is that whatever we pick now will be secure for some time to come and that when we need to adopt a newer hash, whether due to cryptographic advance or post-quantum cryptography, there will be no existing SHA-1 repositories and transition can take place from NewHash to NewNewHash. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NewHash alternatives and SHA benchmarks 2019-12-23 1:13 ` brian m. carlson @ 2019-12-23 1:59 ` Michael Clark 2019-12-23 15:15 ` Philip Oakley 0 siblings, 1 reply; 4+ messages in thread From: Michael Clark @ 2019-12-23 1:59 UTC (permalink / raw) To: brian m. carlson; +Cc: git > On 23/12/2019, at 2:13 PM, brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2019-12-22 at 06:41:33, Michael Clark wrote: >> Recommendations >> =============== >> >> After performing multiple benchmarks and thinking about this for >> several weeks I have come to the conclusion that SHA512/256 is a >> very good alternative to SHA256, both from the perspective of its >> length extension resistance, and performance. It seems that NewHash >> implementation is some way off so it is not too late to change. >> >> - Consider SHA512/256 for NewHash > > This has been discussed extensively on the list. Git is not in general > vulnerable to length extension attacks because it hashes the length of > the object as part of the object header. This was documented as one of > the known aspects in the transition plan in > Documentation/technical/hash-function-transition.txt. > > We've considered a lot of hash algorithms, including BLAKE2b-256, > SHA-256, SHA-512/256, SHA-3-256, SHAKE256 with a 256-bit length, and > others. This was no doubt controversial, but we discussed the pros and > cons of many options and this is what we decided. I will state that > while our choice was not one of my more preferred options, I value that > we have a consensus on the way forward. > > Among the reasons we chose SHA-256 is performance on newer x86-64 > processors with the SHA extensions, which support only SHA-1 and > SHA-256, and compatibility with a wide variety of cryptographic > libraries. I have no desire for Git to provide assembly implementations > of NewHash, and that means people must rely on existing cryptographic > libraries. the SHA extensions that are not generally available, which include SHA1 on the cusp of its deprecation. It seems silly not to support SHA512 when implementing SHA2 support especially considering it’s faster on 64-bit. It’s used in PBKDF2-SHA512 for iterated password hashes. SHA512/256 is just a different IV. Don’t be discouraged by my comments. Thanks for taking the time to describe the state. The rationale of pragmatism should be in the hash migration plan. SHA256 will always take more energy than SHA512 which actually means is possibly slightly weaker, which could be used as a rationale for choosing SHA256. I don’t think the other rationale is acceptable though because the choice of purity of choice should lead what goes into hardware. I want pure hashes of the unpacked object so I will have to think about how I do that. Possibly SHA-3 too. git doesn’t really support large files that well either so I’ll have to keep that in mind... If the algorithm is not vulnerable to length extensions, then the length prefix is not required. Anyhow... I’ll have to hack up some shell scripts or something in the meantime. > Cryptographic libraries overwhelmingly do not support SHA-512/256. > While OpenSSL does, it's only in the latest versions, and people > generally cannot legally distribute both Git and OpenSSL linked > together, so they must rely on system libraries for good performance. > Apple Common Crypto, for example, does not document any SHA-512/256 > implementations. > > There is also a benefit to chaining using a single algorithm for > signatures, which therefore necessitates using SHA-256 or one of the > other original SHA-2 algorithms until the next version of OpenPGP is > implemented which supports SHA-3. > > Because we decided some time ago, I've sent in a bunch of patches to our > testsuite to make it work with SHA-256. Some of these patches are > general, in that they make the tests generate values which are used, or > they are specific to the length of the hash algorithm. Others use > specific hash values, and changing the hash algorithm will require > recomputing all of these values. > > Absent a compelling security reason to abandon SHA-256, such as a > significant previously unknown cryptographic weakness, I don't plan to > reimplement all of this work. Updating our testsuite to work > successfully with SHA-256 has taken a huge amount of time, and this work > has been entirely done on my own free time because I want the Git > project to be successful. That doesn't even include the contributions > of others who have reviewed, discussed, and contributed to the current > work and transition plan. While I appreciate the benefits of > alternatives to SHA-256, I'm a bit annoyed that you're proposing to > throw away all of this work after we explicitly sought to build > consensus about an algorithm so that this stage of the work could begin. > > Of course, nothing prevents you from proposing a new algorithm and > fixing the testsuite to make it work, but I don't believe that we're > likely to adopt a different algorithm at this point, given the previous > discussion on the list. The code is designed to be pluggable to support > arbitrary algorithms, but there are pieces which assume that the > algorithm can be distinguished by length (e.g., the dumb HTTP protocol) > and therefore you'd need to devise a different way to distinguish these > versions. > > If you'd like to see what a Git binary that supports multiple hash > algorithms looks like at the moment, you're welcome to look at my > transition-stage-4 branch at https://github.com/bk2204/git.git so you > can get a fuller understanding of what's involved. You can run the > testsuite with that code and GIT_TEST_DEFAULT_HASH=<hashname> to see > what passes and fails. Thanks. I’ll check it out and try it. My comments aren’t directed at you. I’m interested in your work. Keep it up. Well, take a break. idk. Whatever. I’ll probably be spending my holidays reading git code, or maybe libgit2 :-p >> SHA algorithm patches >> ===================== >> >> I will be sending a patch series which adds SHA Algorithms which >> are in the sha-algorithms branch of my git tree. >> >> - https://github.com/michaeljclark/git.git >> - https://github.com/michaeljclark/git/tree/sha-algorithms/sha >> >> This is a precursor to asking some questions about the proposed >> implementation of multiple or compat hashs (N-hash or 2-hash?). >> I am curious about the pack format and directory structure for >> loose objects. It seems there needs to be a single hash chosen >> for loose object filenames, and auxilliary hashes will need >> hash to hash map to go from a compat hash to new hash. > > That's correct. There's documentation in the transition plan about what > this will look like. > > A stage 4 implementation (where repositories are either SHA-1 only or > SHA-256 only) has been written and I'll send it out once the remaining > test fixes are merged. > > I should point out that we specifically chose not to support a plethora > of algorithms because the goal is to encourage everybody to adopt a > single, secure algorithm. The network protocol will need to > interoperate using a single algorithm, which means the repository will > need to contain a mapping for that algorithm. Since the current plan is > to support no more than two algorithms at a time, one of which will need > to be SHA-1, allowing users to pick and choose from a variety of > algorithms harms interoperability. Seamless interoperability between > SHA-1 and NewHash repositories was considered extremely important in the > transition plan. That won't be coming immediately with the stage 4 > implementation, but it is planned for the future. > > My hope is that whatever we pick now will be secure for some time to > come and that when we need to adopt a newer hash, whether due to > cryptographic advance or post-quantum cryptography, there will be no > existing SHA-1 repositories and transition can take place from NewHash > to NewNewHash. > -- > brian m. carlson: Houston, Texas, US > OpenPGP: https://keybase.io/bk2204 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NewHash alternatives and SHA benchmarks 2019-12-23 1:59 ` Michael Clark @ 2019-12-23 15:15 ` Philip Oakley 0 siblings, 0 replies; 4+ messages in thread From: Philip Oakley @ 2019-12-23 15:15 UTC (permalink / raw) To: Michael Clark, brian m. carlson; +Cc: git On 23/12/2019 01:59, Michael Clark wrote: > git doesn’t really support large files that well either so I’ll have to keep that in mind... I presume you are thinking of the Git-for-Windows (GfW) implementation here, where 'sizeof(long)'=32 bits, so currently the GfW is limited to 2/4GB file sizes. The limitation is partly due to the zlib library API which uses 'long' for stream sizes, though does have 'pointer*' file sizing, and hence follows the Microsoft 32->64 bit transition approach. However zlib is able to handle larger files when the deflation/inflation is done in chunks of less than 'size(long)'. On the 'Linux' version the limit is a 64 bit address and file size, so probably not a real issue on those platforms ;-) There has been a body of work on the change to using size_t for Git file sizes, but such a change is extensive and may not be acceptable because of the size of the code churn. I (and others!) had been looking at it https://github.com/git-for-windows/git/pull/2179. Hope that helps clarify the large file support limitation on Windows. Philip ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-23 15:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-22 6:41 NewHash alternatives and SHA benchmarks Michael Clark 2019-12-23 1:13 ` brian m. carlson 2019-12-23 1:59 ` Michael Clark 2019-12-23 15:15 ` Philip Oakley
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).