git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	demerphq <demerphq@gmail.com>,
	agl@google.com, keccak@noekeon.org
Subject: Re: Hash algorithm analysis
Date: Tue, 12 Jun 2018 01:27:20 +0200	[thread overview]
Message-ID: <878t7kx5t3.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CA+55aFw+E9GT7TKC_EgPTVcvHR8HDSipNPa7VQ1ASeL1M68xMQ@mail.gmail.com>


On Mon, Jun 11 2018, Linus Torvalds wrote:

> On Mon, Jun 11, 2018 at 12:29 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>>
>> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
>> it is constructed.
>
> Yeah, I really think that it's a mistake to switch to something that
> has the same problem SHA1 had.
>
> That doesn't necessarily mean SHA3, but it does mean "bigger
> intermediate hash state" (so no length extension attack), which could
> be SHA3, but also SHA-512/256 or K12.
>
> Honestly, git has effectively already moved from SHA1 to SHA1DC.
>
> So the actual known attack and weakness of SHA1 should simply not be
> part of the discussion for the next hash. You can basically say "we're
> _already_ on the second hash, we just picked one that was so
> compatible with SHA1 that nobody even really noticed.
>
> The reason to switch is
>
>  (a) 160 bits may not be enough
>
>  (b) maybe there are other weaknesses in SHA1 that SHA1DC doesn't catch.
>
>  (c) others?
>
> Obviously all of the choices address (a).

FWIW I updated our docs 3 months ago to try to address some of this:
https://github.com/git/git/commit/5988eb631a

> But at least for me, (b) makes me go "well, SHA2 has the exact same
> weak inter-block state attack, so if there are unknown weaknesses in
> SHA1, then what about unknown weaknesses in SHA2"?
>
> And no, I'm not a cryptographer. But honestly, length extension
> attacks were how both md5 and sha1 were broken in practice, so I'm
> just going "why would we go with a crypto choice that has that known
> weakness? That's just crazy".

What do you think about Johannes's summary of this being a non-issue for
Git in
https://public-inbox.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/
?

> From a performance standpoint, I have to say (once more) that crypto
> performance actually mattered a lot less than I originally thought it
> would. Yes, there are phases that do care, but they are rare.

One real-world case is rebasing[1]. As noted in that E-Mail of mine a
year ago we can use SHA1DC v.s. OpenSSL as a stand-in for the sort of
performance difference we might expect between hash functions, although
as you note this doesn't account for the difference in length.

With our perf tests, in t/perf on linux.git:

    $ GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_COMMAND='if pwd | grep -q $(git rev-parse origin/master); then make -j8 CFLAGS=-O3 DC_SHA1=Y; else make -j8 CFLAGS=-O3 OPENSSL_SHA1=Y; fi' ./run origin/master~ origin/master -- p3400-rebase.sh
    Test                                                            origin/master~    origin/master
    --------------------------------------------------------------------------------------------------------
    3400.2: rebase on top of a lot of unrelated changes             1.38(1.19+0.11)   1.40(1.23+0.10) +1.4%
    3400.4: rebase a lot of unrelated changes without split-index   4.07(3.28+0.66)   4.62(3.71+0.76) +13.5%
    3400.6: rebase a lot of unrelated changes with split-index      3.41(2.94+0.38)   3.35(2.87+0.37) -1.8%

On a bigger monorepo I have here:

    Test                                                            origin/master~    origin/master
    -------------------------------------------------------------------------------------------------------
    3400.2: rebase on top of a lot of unrelated changes             1.39(1.19+0.17)   1.34(1.16+0.16) -3.6%
    3400.4: rebase a lot of unrelated changes without split-index   6.67(3.37+0.63)   6.95(3.90+0.62) +4.2%
    3400.6: rebase a lot of unrelated changes with split-index      3.70(2.85+0.45)   3.73(2.85+0.41) +0.8%

I didn't paste any numbers in that E-Mail a year ago, maybe I produced
them differently, but this is clerly not that of a "big difference". But
this is one way to see the difference.

> For example, I think SHA1 performance has probably mattered most for
> the index and pack-file, where it's really only used as a fancy CRC.
> For most individual object cases, it is almost never an issue.

Yeah there's lots of things we could optimize there, but we are going to
need to hash things to create the commit in e.g. the rebase case, but
much of that could probably be done more efficiently without switching
the hash.

> From a performance angle, I think the whole "256-bit hashes are
> bigger" is going to be the more noticeable performance issue, just
> because things like delta compression and zlib - both of which are
> very *real* and present performance issues - will have more data that
> they need to work on. The performance difference between different
> hashing functions is likely not all that noticeable in most common
> cases as long as we're not talking orders of magnitude.
>
> And yes, I guess we're in the "approaching an order of magnitude"
> performance difference, but we should actually compare not to OpenSSL
> SHA1, but to SHA1DC. See above.
>
> Personally, the fact that the Keccak people would suggest K12 makes me
> think that should be a front-runner, but whatever. I don't think the
> 128-bit preimage case is an issue, since 128 bits is the brute-force
> cost for any 256-bit hash.
>
> But hey, I picked sha1 to begin with, so take any input from me with
> that historical pinch of salt in mind ;)

1. https://public-inbox.org/git/87tw3f8vez.fsf@gmail.com/

  reply	other threads:[~2018-06-11 23:27 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-09 20:56 State of NewHash work, future directions, and discussion brian m. carlson
2018-06-09 21:26 ` Ævar Arnfjörð Bjarmason
2018-06-09 22:49 ` Hash algorithm analysis brian m. carlson
2018-06-11 19:29   ` Jonathan Nieder
2018-06-11 20:20     ` Linus Torvalds
2018-06-11 23:27       ` Ævar Arnfjörð Bjarmason [this message]
2018-06-12  0:11         ` David Lang
2018-06-12  0:45         ` Linus Torvalds
2018-06-11 22:35     ` brian m. carlson
2018-06-12 16:21       ` Gilles Van Assche
2018-06-13 23:58         ` brian m. carlson
2018-06-15 10:33           ` Gilles Van Assche
2018-07-20 21:52     ` brian m. carlson
2018-07-21  0:31       ` Jonathan Nieder
2018-07-21 19:52       ` Ævar Arnfjörð Bjarmason
2018-07-21 20:25         ` brian m. carlson
2018-07-21 22:38       ` Johannes Schindelin
2018-07-21 23:09         ` Linus Torvalds
2018-07-21 23:59         ` brian m. carlson
2018-07-22  9:34           ` Eric Deplagne
2018-07-22 14:21             ` brian m. carlson
2018-07-22 14:55               ` Eric Deplagne
2018-07-26 10:05                 ` Johannes Schindelin
2018-07-22 15:23           ` Joan Daemen
2018-07-22 18:54             ` Adam Langley
2018-07-26 10:31             ` Johannes Schindelin
2018-07-23 12:40           ` demerphq
2018-07-23 12:48             ` Sitaram Chamarty
2018-07-23 12:55               ` demerphq
2018-07-23 18:23               ` Linus Torvalds
2018-07-23 17:57             ` Stefan Beller
2018-07-23 18:35             ` Jonathan Nieder
2018-07-24 19:01       ` Edward Thomson
2018-07-24 20:31         ` Linus Torvalds
2018-07-24 20:49           ` Jonathan Nieder
2018-07-24 21:13           ` Junio C Hamano
2018-07-24 22:10             ` brian m. carlson
2018-07-30  9:06               ` Johannes Schindelin
2018-07-30 20:01                 ` Dan Shumow
2018-08-03  2:57                   ` Jonathan Nieder
2018-09-18 15:18                   ` Joan Daemen
2018-09-18 15:32                     ` Jonathan Nieder
2018-09-18 16:50                     ` Linus Torvalds
2018-07-25  8:30             ` [PATCH 0/2] document that NewHash is now SHA-256 Ævar Arnfjörð Bjarmason
2018-07-25  8:30             ` [PATCH 1/2] doc hash-function-transition: note the lack of a changelog Ævar Arnfjörð Bjarmason
2018-07-25  8:30             ` [PATCH 2/2] doc hash-function-transition: pick SHA-256 as NewHash Ævar Arnfjörð Bjarmason
2018-07-25 16:45               ` Junio C Hamano
2018-07-25 17:25                 ` Jonathan Nieder
2018-07-25 21:32                   ` Junio C Hamano
2018-07-26 13:41                     ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2018-08-03  7:20                       ` Jonathan Nieder
2018-08-03 16:40                         ` Junio C Hamano
2018-08-03 17:01                           ` Linus Torvalds
2018-08-03 16:42                         ` Linus Torvalds
2018-08-03 17:43                         ` Ævar Arnfjörð Bjarmason
2018-08-04  8:52                           ` Jonathan Nieder
2018-08-03 17:45                         ` brian m. carlson
2018-07-25 22:56                 ` [PATCH " brian m. carlson
2018-06-11 21:19   ` Hash algorithm analysis Ævar Arnfjörð Bjarmason
2018-06-21  8:20     ` Johannes Schindelin
2018-06-21 22:39     ` brian m. carlson
2018-06-11 18:09 ` State of NewHash work, future directions, and discussion Duy Nguyen
2018-06-12  1:28   ` brian m. carlson
2018-06-11 19:01 ` Jonathan Nieder
2018-06-12  2:28   ` brian m. carlson
2018-06-12  2:42     ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878t7kx5t3.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=agl@google.com \
    --cc=demerphq@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=keccak@noekeon.org \
    --cc=sandals@crustytoothpaste.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).