git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Stephen Smith <ischis2@cox.net>, git <git@vger.kernel.org>
Subject: Re: SHA-256 transition
Date: Fri, 24 Jun 2022 17:49:22 +0200	[thread overview]
Message-ID: <220624.868rpmhx8e.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YrWXdNGZGN7gXL40@coredump.intra.peff.net>


On Fri, Jun 24 2022, Jeff King wrote:

> On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote:
>
>> > We've since migrated our default hash function from SHA-1 to SHA-1DC
>> > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
>> > SHAttered attack implemented by the same researchers. I'm not aware of a
>> > current viable SHA-1 collision against the variant of SHA-1 that we
>> > actually use these days.
>> 
>> That's true, but that still doesn't let you store the data.  There is
>> some data that you can't store in a SHA-1 repository, and SHA-1DC is
>> extremely slow.  Using SHA-256 can make things like indexing packs
>> substantially faster.
>
> I'm curious if you have numbers on this. I naively converted linux.git
> to sha256 by doing "fast-export | fast-import" (the latter in a sha256
> repo, of course, and then both repacked with "-f --window=250" to get
> reasonable apples-to-apples packs).
>
> Running "index-pack --verify" on the result takes about the same time
> (this is on an 8-core system, hence the real/user differences):
>
>   [sha1dc]
>   real	2m43.754s
>   user	10m52.452s
>   sys	0m36.745s
>
>   [sha256]
>   real	2m41.884s
>   user	12m23.344s
>   sys	0m35.222s
>
> The sha256 repo actually has about 10% fewer objects (I didn't
> investigate, but this is perhaps due to cutting out tags and a few other
> things to convince fast-export to finish running). I'm not sure about
> the extra user time (multicore timings here are funny because of
> frequency scaling, so I think the "real" line is more interesting). So
> sha256 actually comes out a bit worse here. On the other hand, this is
> just using our blk_SHA256 implementation. There may be faster
> alternatives (including ones with hardware support).
>
> I wouldn't be at all surprised if the difference isn't substantial in
> the long run, though. The repo is on the order of 100GB of object data.
> That's a lot to hash, but it's also just a lot to deal with at all (zlib
> inflating, applying deltas, etc).
>
> Anyway, this is a pretty rough cut at an experiment. I was mostly
> curious if you had done something more advanced, and/or gotten different
> results.

I haven't checked or verified this, but
https://www.marc-stevens.nl/research/#software claims:

    Counter-cryptanalysis: New improved release SHA-1 collision
    detection library, which protects against twice as many SHA-1 attack
    classes (disturbance vectors), but is 9 times faster than previous
    version. Speed is now 1.87 times normal SHA-1. It is currently used
    among others by Git, GitHub, GMail, Google Drive and Microsoft
    OneDrive.

And looking at the OID you initially imported for sha1dc (and my later
submodule import) we've always had what seems to have been that
performance improvement, which I think (but I didn't have time to
benchmark) is:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/20

*But* there was also this later performance work:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/30; see
also this comment:
https://github.com/cr-marcstevens/sha1collisiondetection/commit/33a694a9ee1b79c24be45f9eab5ac0e1aeeaf271

And then if you look at the sha1collisiondetection repo the latest tag
is stable-v1.0.3, which pre-dates that (but not the original perf work),
and was tagged in 2017. There were a lot of commits since then.

I wasn't able to find any third party package using DC_SHA1_EXTERNAL,
but I wonder if any performance tests with sha1dc in the wild are using
some older version, which from the looks of it might have had a
performance regression on x86...



  reply	other threads:[~2022-06-24 15:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 22:51 SHA-256 transition Stephen Smith
2022-06-20 23:13 ` rsbecker
2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
2022-06-21 13:18   ` rsbecker
2022-06-21 18:14     ` Ævar Arnfjörð Bjarmason
2022-06-22  0:29   ` brian m. carlson
2022-06-23  0:45     ` Stephen Smith
2022-06-23  1:44       ` brian m. carlson
2022-06-23 15:32         ` Junio C Hamano
2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
2022-06-24  0:29       ` Kyle Meyer
2022-06-24  1:03       ` Stephen Smith
2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
2022-06-24 14:42           ` Jonathan Corbet
2022-06-24 10:52     ` Jeff King
2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason [this message]
2022-06-25  8:53       ` brian m. carlson
2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
2022-06-26  0:27           ` Junio C Hamano
2022-06-26 15:19             ` brian m. carlson
2022-07-01 18:00         ` SHA-256 transition Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220624.868rpmhx8e.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ischis2@cox.net \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).