git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* NewHash alternatives and SHA benchmarks
@ 2019-12-22  6:41 Michael Clark
  2019-12-23  1:13 ` brian m. carlson
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Clark @ 2019-12-22  6:41 UTC (permalink / raw)
  To: git; +Cc: Michael Clark

Hi Folks,

I have become interested in git's NewHash migration and have been
investigating alternate hash algorithm support in git. It seems
that implementation of NewHash is still a way off, and it is still
possible to revise the choice of SHA256 for a better alternative.

Many of you may already be aware of Hash content Length extension
attacks [1]. While I am not aware of any git vulnerabilities based
on length extension; it seems to me that choosing a hash algorithm
that is not vulnerable to this attack could prevent a future exploit
based on some future protocol or other bug. git hashes should be final
in its usage, so length extension resistance is a good property.

SHA3 is not vulnerable to content length extension because it uses a
sponge construction internally, meaning state is diffused away, and
the state cannot be reconstructed from a hash value. Truncated SHA2
hashes such as SHA224 and SHA512/256 are also not vulnerable to
length extension, however, SHA224 only reduces the probability
of extension by 1/2^32 which, while not trivial, is a small reduction.

SHA512/256 or SHA3-256 seems to me to be most attractive for safety
against content length extension.

With this in mind, I decided to investigate performance properties of
the alternative SHA hash algorithms. Of particular interest to me is
SHA3-256 and SHA512/256 (SHA512 truncated to 256-bits) as they are both
the same size as SHA256 thus are good NewHash contenders.

I was quite surprised by the benchmark results with SHA512/256 1.5X
faster than SHA256. SHA512/256 seems a very good alternative to SHA256:

- SHA256 8 x 32-bit state (256-bits) and 64-byte block, 64 rounds
- SHA512 8 x 64-bit state (512-bits) and 128-byte block, 80 rounds

SHA512/256 has understandable good performance on 64-bits systems due
to having only slightly more iterations (80 instead of 64) while
processing 128 bytes per block instead of 64. In practice, this means
SHA512/256 is about 1.5X faster than SHA256 on 64-bit systems.

SHA3, while competitive, is still significantly slower than SHA512.
The AVX512 version of SHA3 is actually faster than SHA256. SHA256 is
not the best performing NewHash alternative by a long shot.

SHA Benchmarks on Intel Core i7-7980XE @ 4.6GHz
===============================================

(MiB/sec)\      ext      16     128    1024    4096   65536 1048576
     -----    -----   -----   -----   -----   -----   -----   -----
      sha1     avx2    37.5   239.6   817.0  1097.5  1231.2  1244.8
    sha224     avx2    30.8   170.5   439.3   524.3   560.2   564.4
    sha256     avx2    31.2   171.2   438.7   525.0   560.9   564.8
    sha512     avx2    27.1   167.2   561.5   747.1   835.7   842.8
sha512-224     avx2    26.8   164.7   558.8   747.9   836.5   845.1
sha512-256     avx2    26.6   163.9   557.9   751.4   835.0   843.0
  sha3-224     avx2    38.9   313.8   411.5   474.6   491.1   490.7
  sha3-256     avx2    39.7   317.3   414.5   445.3   464.9   464.3
  sha3-384     avx2    39.8   184.6   338.1   348.4   356.5   358.0
  sha3-512     avx2    40.0   185.0   228.4   245.9   248.0   249.0

(MiB/sec)\      ext      16     128    1024    4096   65536 1048576
     -----    -----   -----   -----   -----   -----   -----   -----
  sha3-224   avx512    24.3   192.6   506.5   674.3   736.8   740.9
  sha3-256   avx512    25.0   198.4   511.5   635.4   698.8   698.8
  sha3-384   avx512    24.8   152.3   429.6   503.9   535.7   540.2
  sha3-512   avx512    24.8   152.4   307.3   361.2   375.6   374.4

Recommendations
===============

After performing multiple benchmarks and thinking about this for
several weeks I have come to the conclusion that SHA512/256 is a
very good alternative to SHA256, both from the perspective of its
length extension resistance, and performance. It seems that NewHash
implementation is some way off so it is not too late to change.

- Consider SHA512/256 for NewHash

SHA algorithm patches
=====================

I will be sending a patch series which adds SHA Algorithms which
are in the sha-algorithms branch of my git tree.

- https://github.com/michaeljclark/git.git
- https://github.com/michaeljclark/git/tree/sha-algorithms/sha

This is a precursor to asking some questions about the proposed
implementation of multiple or compat hashs (N-hash or 2-hash?).
I am curious about the pack format and directory structure for
loose objects. It seems there needs to be a single hash chosen
for loose object filenames, and auxilliary hashes will need
hash to hash map to go from a compat hash to new hash.

I have been reading the code, and it seems like it is going to be
a challenge to change to support more than one hash concurrently.
My observation is the content addressable storage layer could do
with being a little more abstract, as there seems to be a lot of
code that seperately computes hashes by accessing the hash algorithms
directly versus going through an abstract content-addressable storage
layer, like the VFS in Linux.

In any case, I hope you find this benchmark data useful.

Regards,

Michael

[1] https://en.wikipedia.org/wiki/Length_extension_attack

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NewHash alternatives and SHA benchmarks
  2019-12-22  6:41 NewHash alternatives and SHA benchmarks Michael Clark
@ 2019-12-23  1:13 ` brian m. carlson
  2019-12-23  1:59   ` Michael Clark
  0 siblings, 1 reply; 4+ messages in thread
From: brian m. carlson @ 2019-12-23  1:13 UTC (permalink / raw)
  To: Michael Clark; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 6260 bytes --]

On 2019-12-22 at 06:41:33, Michael Clark wrote:
> Recommendations
> ===============
> 
> After performing multiple benchmarks and thinking about this for
> several weeks I have come to the conclusion that SHA512/256 is a
> very good alternative to SHA256, both from the perspective of its
> length extension resistance, and performance. It seems that NewHash
> implementation is some way off so it is not too late to change.
> 
> - Consider SHA512/256 for NewHash

This has been discussed extensively on the list.  Git is not in general
vulnerable to length extension attacks because it hashes the length of
the object as part of the object header.  This was documented as one of
the known aspects in the transition plan in
Documentation/technical/hash-function-transition.txt.

We've considered a lot of hash algorithms, including BLAKE2b-256,
SHA-256, SHA-512/256, SHA-3-256, SHAKE256 with a 256-bit length, and
others.  This was no doubt controversial, but we discussed the pros and
cons of many options and this is what we decided.  I will state that
while our choice was not one of my more preferred options, I value that
we have a consensus on the way forward.

Among the reasons we chose SHA-256 is performance on newer x86-64
processors with the SHA extensions, which support only SHA-1 and
SHA-256, and compatibility with a wide variety of cryptographic
libraries.  I have no desire for Git to provide assembly implementations
of NewHash, and that means people must rely on existing cryptographic
libraries.

Cryptographic libraries overwhelmingly do not support SHA-512/256.
While OpenSSL does, it's only in the latest versions, and people
generally cannot legally distribute both Git and OpenSSL linked
together, so they must rely on system libraries for good performance.
Apple Common Crypto, for example, does not document any SHA-512/256
implementations.

There is also a benefit to chaining using a single algorithm for
signatures, which therefore necessitates using SHA-256 or one of the
other original SHA-2 algorithms until the next version of OpenPGP is
implemented which supports SHA-3.

Because we decided some time ago, I've sent in a bunch of patches to our
testsuite to make it work with SHA-256.  Some of these patches are
general, in that they make the tests generate values which are used, or
they are specific to the length of the hash algorithm.  Others use
specific hash values, and changing the hash algorithm will require
recomputing all of these values.

Absent a compelling security reason to abandon SHA-256, such as a
significant previously unknown cryptographic weakness, I don't plan to
reimplement all of this work.  Updating our testsuite to work
successfully with SHA-256 has taken a huge amount of time, and this work
has been entirely done on my own free time because I want the Git
project to be successful.  That doesn't even include the contributions
of others who have reviewed, discussed, and contributed to the current
work and transition plan.  While I appreciate the benefits of
alternatives to SHA-256, I'm a bit annoyed that you're proposing to
throw away all of this work after we explicitly sought to build
consensus about an algorithm so that this stage of the work could begin.

Of course, nothing prevents you from proposing a new algorithm and
fixing the testsuite to make it work, but I don't believe that we're
likely to adopt a different algorithm at this point, given the previous
discussion on the list.  The code is designed to be pluggable to support
arbitrary algorithms, but there are pieces which assume that the
algorithm can be distinguished by length (e.g., the dumb HTTP protocol)
and therefore you'd need to devise a different way to distinguish these
versions.

If you'd like to see what a Git binary that supports multiple hash
algorithms looks like at the moment, you're welcome to look at my
transition-stage-4 branch at https://github.com/bk2204/git.git so you
can get a fuller understanding of what's involved.  You can run the
testsuite with that code and GIT_TEST_DEFAULT_HASH=<hashname> to see
what passes and fails.

> SHA algorithm patches
> =====================
> 
> I will be sending a patch series which adds SHA Algorithms which
> are in the sha-algorithms branch of my git tree.
> 
> - https://github.com/michaeljclark/git.git
> - https://github.com/michaeljclark/git/tree/sha-algorithms/sha
> 
> This is a precursor to asking some questions about the proposed
> implementation of multiple or compat hashs (N-hash or 2-hash?).
> I am curious about the pack format and directory structure for
> loose objects. It seems there needs to be a single hash chosen
> for loose object filenames, and auxilliary hashes will need
> hash to hash map to go from a compat hash to new hash.

That's correct.  There's documentation in the transition plan about what
this will look like.

A stage 4 implementation (where repositories are either SHA-1 only or
SHA-256 only) has been written and I'll send it out once the remaining
test fixes are merged.

I should point out that we specifically chose not to support a plethora
of algorithms because the goal is to encourage everybody to adopt a
single, secure algorithm.  The network protocol will need to
interoperate using a single algorithm, which means the repository will
need to contain a mapping for that algorithm.  Since the current plan is
to support no more than two algorithms at a time, one of which will need
to be SHA-1, allowing users to pick and choose from a variety of
algorithms harms interoperability.  Seamless interoperability between
SHA-1 and NewHash repositories was considered extremely important in the
transition plan.  That won't be coming immediately with the stage 4
implementation, but it is planned for the future.

My hope is that whatever we pick now will be secure for some time to
come and that when we need to adopt a newer hash, whether due to
cryptographic advance or post-quantum cryptography, there will be no
existing SHA-1 repositories and transition can take place from NewHash
to NewNewHash.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NewHash alternatives and SHA benchmarks
  2019-12-23  1:13 ` brian m. carlson
@ 2019-12-23  1:59   ` Michael Clark
  2019-12-23 15:15     ` Philip Oakley
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Clark @ 2019-12-23  1:59 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git



> On 23/12/2019, at 2:13 PM, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> 
> On 2019-12-22 at 06:41:33, Michael Clark wrote:
>> Recommendations
>> ===============
>> 
>> After performing multiple benchmarks and thinking about this for
>> several weeks I have come to the conclusion that SHA512/256 is a
>> very good alternative to SHA256, both from the perspective of its
>> length extension resistance, and performance. It seems that NewHash
>> implementation is some way off so it is not too late to change.
>> 
>> - Consider SHA512/256 for NewHash
> 
> This has been discussed extensively on the list.  Git is not in general
> vulnerable to length extension attacks because it hashes the length of
> the object as part of the object header.  This was documented as one of
> the known aspects in the transition plan in
> Documentation/technical/hash-function-transition.txt.
> 
> We've considered a lot of hash algorithms, including BLAKE2b-256,
> SHA-256, SHA-512/256, SHA-3-256, SHAKE256 with a 256-bit length, and
> others.  This was no doubt controversial, but we discussed the pros and
> cons of many options and this is what we decided.  I will state that
> while our choice was not one of my more preferred options, I value that
> we have a consensus on the way forward.
> 
> Among the reasons we chose SHA-256 is performance on newer x86-64
> processors with the SHA extensions, which support only SHA-1 and
> SHA-256, and compatibility with a wide variety of cryptographic
> libraries.  I have no desire for Git to provide assembly implementations
> of NewHash, and that means people must rely on existing cryptographic
> libraries.

the SHA extensions that are not generally available, which include SHA1 on the cusp of its deprecation. It seems silly not to support SHA512 when implementing SHA2 support especially considering it’s faster on 64-bit.

It’s used in PBKDF2-SHA512 for iterated password hashes. SHA512/256 is just a different IV.

Don’t be discouraged by my comments. Thanks for taking the time to describe the state. The rationale of pragmatism should be in the hash migration plan.

SHA256 will always take more energy than SHA512 which actually means is possibly slightly weaker, which could be used as a rationale for choosing SHA256. I don’t think the other rationale is acceptable though because the choice of purity of choice should lead what goes into hardware.

I want pure hashes of the unpacked object so I will have to think about how I do that. Possibly SHA-3 too. git doesn’t really support large files that well either so I’ll have to keep that in mind... If the algorithm is not vulnerable to length extensions, then the length prefix is not required. Anyhow...

I’ll have to hack up some shell scripts or something in the meantime.

> Cryptographic libraries overwhelmingly do not support SHA-512/256.
> While OpenSSL does, it's only in the latest versions, and people
> generally cannot legally distribute both Git and OpenSSL linked
> together, so they must rely on system libraries for good performance.
> Apple Common Crypto, for example, does not document any SHA-512/256
> implementations.
> 
> There is also a benefit to chaining using a single algorithm for
> signatures, which therefore necessitates using SHA-256 or one of the
> other original SHA-2 algorithms until the next version of OpenPGP is
> implemented which supports SHA-3.
> 
> Because we decided some time ago, I've sent in a bunch of patches to our
> testsuite to make it work with SHA-256.  Some of these patches are
> general, in that they make the tests generate values which are used, or
> they are specific to the length of the hash algorithm.  Others use
> specific hash values, and changing the hash algorithm will require
> recomputing all of these values.
> 
> Absent a compelling security reason to abandon SHA-256, such as a
> significant previously unknown cryptographic weakness, I don't plan to
> reimplement all of this work.  Updating our testsuite to work
> successfully with SHA-256 has taken a huge amount of time, and this work
> has been entirely done on my own free time because I want the Git
> project to be successful.  That doesn't even include the contributions
> of others who have reviewed, discussed, and contributed to the current
> work and transition plan.  While I appreciate the benefits of
> alternatives to SHA-256, I'm a bit annoyed that you're proposing to
> throw away all of this work after we explicitly sought to build
> consensus about an algorithm so that this stage of the work could begin.
> 
> Of course, nothing prevents you from proposing a new algorithm and
> fixing the testsuite to make it work, but I don't believe that we're
> likely to adopt a different algorithm at this point, given the previous
> discussion on the list.  The code is designed to be pluggable to support
> arbitrary algorithms, but there are pieces which assume that the
> algorithm can be distinguished by length (e.g., the dumb HTTP protocol)
> and therefore you'd need to devise a different way to distinguish these
> versions.
> 
> If you'd like to see what a Git binary that supports multiple hash
> algorithms looks like at the moment, you're welcome to look at my
> transition-stage-4 branch at https://github.com/bk2204/git.git so you
> can get a fuller understanding of what's involved.  You can run the
> testsuite with that code and GIT_TEST_DEFAULT_HASH=<hashname> to see
> what passes and fails.

Thanks. I’ll check it out and try it.

My comments aren’t directed at you. I’m interested in your work. Keep it up. Well, take a break. idk. Whatever. I’ll probably be spending my holidays reading git code, or maybe libgit2 :-p

>> SHA algorithm patches
>> =====================
>> 
>> I will be sending a patch series which adds SHA Algorithms which
>> are in the sha-algorithms branch of my git tree.
>> 
>> - https://github.com/michaeljclark/git.git
>> - https://github.com/michaeljclark/git/tree/sha-algorithms/sha
>> 
>> This is a precursor to asking some questions about the proposed
>> implementation of multiple or compat hashs (N-hash or 2-hash?).
>> I am curious about the pack format and directory structure for
>> loose objects. It seems there needs to be a single hash chosen
>> for loose object filenames, and auxilliary hashes will need
>> hash to hash map to go from a compat hash to new hash.
> 
> That's correct.  There's documentation in the transition plan about what
> this will look like.
> 
> A stage 4 implementation (where repositories are either SHA-1 only or
> SHA-256 only) has been written and I'll send it out once the remaining
> test fixes are merged.
> 
> I should point out that we specifically chose not to support a plethora
> of algorithms because the goal is to encourage everybody to adopt a
> single, secure algorithm.  The network protocol will need to
> interoperate using a single algorithm, which means the repository will
> need to contain a mapping for that algorithm.  Since the current plan is
> to support no more than two algorithms at a time, one of which will need
> to be SHA-1, allowing users to pick and choose from a variety of
> algorithms harms interoperability.  Seamless interoperability between
> SHA-1 and NewHash repositories was considered extremely important in the
> transition plan.  That won't be coming immediately with the stage 4
> implementation, but it is planned for the future.
> 
> My hope is that whatever we pick now will be secure for some time to
> come and that when we need to adopt a newer hash, whether due to
> cryptographic advance or post-quantum cryptography, there will be no
> existing SHA-1 repositories and transition can take place from NewHash
> to NewNewHash.
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NewHash alternatives and SHA benchmarks
  2019-12-23  1:59   ` Michael Clark
@ 2019-12-23 15:15     ` Philip Oakley
  0 siblings, 0 replies; 4+ messages in thread
From: Philip Oakley @ 2019-12-23 15:15 UTC (permalink / raw)
  To: Michael Clark, brian m. carlson; +Cc: git

On 23/12/2019 01:59, Michael Clark wrote:
> git doesn’t really support large files that well either so I’ll have to keep that in mind...
I presume you are thinking of the Git-for-Windows (GfW) implementation
here, where 'sizeof(long)'=32 bits, so currently the GfW is limited to
2/4GB file sizes.

The limitation is partly due to the zlib library API which uses 'long'
for stream sizes, though does have 'pointer*' file sizing, and hence
follows the Microsoft 32->64 bit transition approach. However zlib is
able to handle larger files when the deflation/inflation is done in
chunks of less than 'size(long)'.

On the 'Linux' version the limit is a 64 bit address and file size, so
probably not a real issue on those platforms ;-)

There has been a body of work on the change to using size_t for Git file
sizes, but such a change is extensive and may not be acceptable because
of the size of the code churn. I (and others!) had been looking at it
https://github.com/git-for-windows/git/pull/2179.

Hope that helps clarify the large file support limitation on Windows.

Philip

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-23 15:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-22  6:41 NewHash alternatives and SHA benchmarks Michael Clark
2019-12-23  1:13 ` brian m. carlson
2019-12-23  1:59   ` Michael Clark
2019-12-23 15:15     ` Philip Oakley

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).