git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Stephen Smith <ischis2@cox.net>, git <git@vger.kernel.org>,
	Jeff King <peff@peff.org>, Kyle Meyer <kyle@kyleam.com>
Subject: Re: SHA-256 transition
Date: Fri, 24 Jun 2022 00:21:05 +0200	[thread overview]
Message-ID: <220624.86fsjvj690.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YrI9dvfoc5NYgVDq@tapette.crustytoothpaste.net>


On Wed, Jun 22 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
>> 
>> But the reason I'd still say "no" on the technical/UX side is:
>> 
>>  * The inter-op between SHA-256 and SHA-1 repositories is still
>>    nonexistent, except for a one-off import. I.e. we don't have any
>>    graceful way to migrate an existing repository.
>
> True, but that doesn't meant that new repositories couldn't use SHA-256.

Indeed, and people who know enough about its state can (and in some
cases probably should) use it.

I took the start of the thread to be a question about the state of the
SHA-1 -> SHA-256 transition, and what we should be generally
recommending to users at this point.

>>  * For new repositories I think you'll probably want to eventually push
>>    it to one of the online git hosting providers, none of which (as far
>>    as I'm aware) support SHA-256 repos.
>
> This, in my view, is the only compelling reason not to use it for new
> repositories.

I think certainly the main one, given most people's workflows around Git
being heavily forge-based .

>>  * Even if not, any local git tooling that's not part of git.git is
>>    likely to break, often for trivial reasons like expecting SHA-1 sized
>>    hashes in the output, but if you start using it for your repositories
>>    and use such tools you're very likely to be the first person to run
>>    into bugs in those areas.
>
> It's my hope to see libgit2 working on SHA-256 repositories in the
> relatively near future.

I was referring to the very long tail of tooling here.

E.g. I use magit with Emacs, and last I checked it would puke on
SHA-256. But checking again it seems someone patched it in January of
this year to e.g. change "{40}" in regexes to "{40,}", so in theory it
should work now (but I didn't try actually using it in that mode).

We even still have UI code shipped as part of git.git itself that only
supports SHA-1, e.g. git-gui's "blame" feature. We were discussing some
patches for that late last year, but they didn't make it in:
https://lore.kernel.org/git/20211011121757.627-1-carenas@gmail.com/

Any individual tool like that isn't critical, but I'd think that a large
long tail of tooling git users are likely to interact with, which for
the most part isn't ready.

I looked at "tig"'s source now, which I only very occasionally use, and
it still has SHA-1 sized constants hardcoded etc...

Of course that's a chicken & egg problem, and at some point we'll need
more brave early adopters. I'm only trying to relay the ground truth of
what the state is now, for someone who might not be aware of the
potential trouble they're getting themselves into.

>> But more importantly (and note that these views are definitely *not*
>> shared by some other project members, so take it with a grain of salt):
>> There just isn't any compelling selling point to migrate to SHA-256 in
>> the near or foreseeable future for a given individual user of git.
>
> I wholly disagree.  SHA-1 is obsolete, and as soon as hosting providers
> support SHA-256, all new repositories should be SHA-256.  There is no
> other defensible reason to continue to use SHA-1 today.

I really don't think we disagree on the need to move away from SHA-1 to
SHA-256. I'm only attempting to summarize the practical threat, and how
users might rightly weight that against other concerns.

NIST deprecated SHA-1 in 2011. I think it's safe given Git's growth that
most people who've used Git started using it after that date, so clearly
there's a large disconnect between official hash algorithm
recommendations and how that translates to practical concerns.

>> The reason we started the SHA-1 -> $newhash (it wasn't known that it
>> would be SHA-256 at the time) was in response to https://shattered.io;
>> Although it had been discussed before, e.g. the thread starting at [1]
>> in 2012.
>> 
>> We've since migrated our default hash function from SHA-1 to SHA-1DC
>> (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
>> SHAttered attack implemented by the same researchers. I'm not aware of a
>> current viable SHA-1 collision against the variant of SHA-1 that we
>> actually use these days.
>
> That's true, but that still doesn't let you store the data.  There is
> some data that you can't store in a SHA-1 repository, [...]

I don't think that's come up before, that's correct, but has anyone
wanted to do that? I.e. people aren't generating these collisions
accidentally, they're crafted.

If we did want to store those we could change the hardcoded
-DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 to "1", now it's set up to just die if
it finds a collision, but it could be made to return the "safe hash".

Of course doing so would mean going all-in on SHA1DC, i.e. such a
repository couldn't interop with our optional OpenSSL and other vanilla
SHA-1 backends.

> [...]and SHA-1DC is extremely slow.  Using SHA-256 can make things
> like indexing packs substantially faster.

Yeah, there's a lot of advantages. We could also safely use hardware
acceleration.

Really, I'm not meaning to poo-poo SHA-256 here, just to provide some
summary of the current state a user might expect.

I do think even this is mostly a fringe benefit in practice. I feel that
pain when I e.g. clone chromium.git, but once I pay that one-off cost
it's mostly not a bottleneck you notice on incremental push/fetch. You
pay for it on "repack", but that's in the background for most users.

It sure would make hosting providers happy though...

We have discussed having our cake here & eating it too in the
past. I.e. we could safely use say OpenSSL SHA-1 for "repack" on, as
long as we kept state and only did so for objects reachable from tips
that we'd already validated with SHA-1DC.

I think it's a datapoint that even those of us who've noticed the hash
slowdown have found it painful, but not *that* painful that we've
invested the effort in even relatively low-hanging-fruit workarounds for
the problem.

...

Finally, I'd really like to thank you for all your work on SHA-256 so
far, and really hope that none of what I've said here is discouraging in
any way. This thread has received some attention outside this ML (on
LWN), so I wanted to clarify some of the points above. Thanks!

  parent reply	other threads:[~2022-06-23 23:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 22:51 SHA-256 transition Stephen Smith
2022-06-20 23:13 ` rsbecker
2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
2022-06-21 13:18   ` rsbecker
2022-06-21 18:14     ` Ævar Arnfjörð Bjarmason
2022-06-22  0:29   ` brian m. carlson
2022-06-23  0:45     ` Stephen Smith
2022-06-23  1:44       ` brian m. carlson
2022-06-23 15:32         ` Junio C Hamano
2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason [this message]
2022-06-24  0:29       ` Kyle Meyer
2022-06-24  1:03       ` Stephen Smith
2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
2022-06-24 14:42           ` Jonathan Corbet
2022-06-24 10:52     ` Jeff King
2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason
2022-06-25  8:53       ` brian m. carlson
2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
2022-06-26  0:27           ` Junio C Hamano
2022-06-26 15:19             ` brian m. carlson
2022-07-01 18:00         ` SHA-256 transition Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220624.86fsjvj690.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ischis2@cox.net \
    --cc=kyle@kyleam.com \
    --cc=peff@peff.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).