SHA1 collisions found

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* SHA1 collisions found
@ 2017-02-23 16:43 Joey Hess
  2017-02-23 17:00 ` David Lang
                   ` (5 more replies)
  0 siblings, 6 replies; 136+ messages in thread
From: Joey Hess @ 2017-02-23 16:43 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

https://shattered.io/static/shattered.pdf
https://freedom-to-tinker.com/2017/02/23/rip-sha-1/

IIRC someone has been working on parameterizing git's SHA1 assumptions
so a repository could eventually use a more secure hash. How far has
that gotten? There are still many "40" constants in git.git HEAD.

In the meantime, git commit -S, and checks that commits are signed,
seems like the only way to mitigate against attacks such as
the ones described in the threads at
https://joeyh.name/blog/sha-1/ and
https://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/

Since we now have collisions in valid PDF files, collisions in valid git
commit and tree objects are probably able to be constructed.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
@ 2017-02-23 17:00 ` David Lang
  2017-02-23 17:02 ` Junio C Hamano
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 136+ messages in thread
From: David Lang @ 2017-02-23 17:00 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

On Thu, 23 Feb 2017, Joey Hess wrote:

> https://shattered.io/static/shattered.pdf
> https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
>
> IIRC someone has been working on parameterizing git's SHA1 assumptions
> so a repository could eventually use a more secure hash. How far has
> that gotten? There are still many "40" constants in git.git HEAD.
>
> In the meantime, git commit -S, and checks that commits are signed,
> seems like the only way to mitigate against attacks such as
> the ones described in the threads at
> https://joeyh.name/blog/sha-1/ and
> https://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/
>
> Since we now have collisions in valid PDF files, collisions in valid git
> commit and tree objects are probably able to be constructed.

keep in mind that there is a huge difference between

creating a collision between two documents you create, both of which contain a 
huge amount of arbitrary binary data that can be changed at will without 
affecting the results

and

creating a collision betwen an existing document that someone else created and a 
new document that is also valid C code without huge amounts of binary in it.

So, it's not time to panic, but it is one more push to make the changes to 
support something else.

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
  2017-02-23 17:00 ` David Lang
@ 2017-02-23 17:02 ` Junio C Hamano
  2017-02-23 17:12   ` David Lang
                     ` (3 more replies)
  2017-02-23 17:19 ` Linus Torvalds
                   ` (3 subsequent siblings)
  5 siblings, 4 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-23 17:02 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>
> Since we now have collisions in valid PDF files, collisions in valid git
> commit and tree objects are probably able to be constructed.

That may be true, but
https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:02 ` Junio C Hamano
@ 2017-02-23 17:12   ` David Lang
  2017-02-23 20:49     ` Jakub Narębski
  2017-02-23 17:18   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 136+ messages in thread
From: David Lang @ 2017-02-23 17:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joey Hess, Git Mailing List

On Thu, 23 Feb 2017, Junio C Hamano wrote:

> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>>
>> Since we now have collisions in valid PDF files, collisions in valid git
>> commit and tree objects are probably able to be constructed.
>
> That may be true, but
> https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/
>

it doesn't help that the Google page on this explicitly says that this shows 
that it's possible to create two different git repos that have the same hash but 
different contents.

https://shattered.it/

How is GIT affected?
GIT strongly relies on SHA-1 for the identification and integrity checking of 
all file objects and commits. It is essentially possible to create two GIT 
repositories with the same head commit hash and different contents, say a benign 
source code and a backdoored one. An attacker could potentially selectively 
serve either repository to targeted users. This will require attackers to 
compute their own collision.

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:02 ` Junio C Hamano
  2017-02-23 17:12   ` David Lang
@ 2017-02-23 17:18   ` Junio C Hamano
  2017-02-23 17:35   ` Joey Hess
  2017-02-23 19:20   ` David Lang
  3 siblings, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-23 17:18 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 9:02 AM, Junio C Hamano <gitster@pobox.com> wrote:
> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>>
>> Since we now have collisions in valid PDF files, collisions in valid git
>> commit and tree objects are probably able to be constructed.
>
> That may be true, but
> https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/

IOW, we want to continue the work to switch from SHA-1, but today's announcement
does not fundamentally change anything and we do not panic.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
  2017-02-23 17:00 ` David Lang
  2017-02-23 17:02 ` Junio C Hamano
@ 2017-02-23 17:19 ` Linus Torvalds
  2017-02-23 17:29   ` Linus Torvalds
  2017-02-23 18:10   ` Joey Hess
  2017-02-24  9:42 ` Duy Nguyen
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 17:19 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>
> IIRC someone has been working on parameterizing git's SHA1 assumptions
> so a repository could eventually use a more secure hash. How far has
> that gotten? There are still many "40" constants in git.git HEAD.

I don't think you'd necessarily want to change the size of the hash.
You can use a different hash and just use the same 160 bits from it.

> Since we now have collisions in valid PDF files, collisions in valid git
> commit and tree objects are probably able to be constructed.

I haven't seen the attack yet, but git doesn't actually just hash the
data, it does prepend a type/length field to it. That usually tends to
make collision attacks much harder, because you either have to make
the resulting size the same too, or you have to be able to also edit
the size field in the header.

pdf's don't have that issue, they have a fixed header and you can
fairly arbitrarily add silent data to the middle that just doesn't get
shown.

So pdf's make for a much better attack vector, exactly because they
are a fairly opaque data format. Git has opaque data in some places
(we hide things in commit objects intentionally, for example, but by
definition that opaque data is fairly secondary.

Put another way: I doubt the sky is falling for git as a source
control management tool. Do we want to migrate to another hash? Yes.
Is it "game over" for SHA1 like people want to say? Probably not.

I haven't seen the attack details, but I bet

 (a) the fact that we have a separate size encoding makes it much
harder to do on git objects in the first place

 (b) we can probably easily add some extra sanity checks to the opaque
data we do have, to make it much harder to do the hiding of random
data that these attacks pretty much always depend on.

                Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:19 ` Linus Torvalds
@ 2017-02-23 17:29   ` Linus Torvalds
  2017-02-23 18:10   ` Joey Hess
  1 sibling, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 17:29 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 9:19 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I don't think you'd necessarily want to change the size of the hash.
> You can use a different hash and just use the same 160 bits from it.

Side note: I do believe that in practice you should just change the
size of the hash too, I'm just saying that the size of the hash and
the choice of the hash algorithm are independent issues.

So you *could* just use  something like SHA3-256, but then pick the
first 160 bits.

Realistically, changing the few hardcoded sizes internally in git is
likely the least problem in switching hashes.

So what you'd probably do is switch to a 256-bit hash, use that
internally and in the native git database, and then by default only
*show* the hash as a 40-character hex string (kind of like how we
already abbreviate things in many situations).

That way tools around git don't even see the change unless passed in
some special "--full-hash" argument (or "--abbrev=64" or whatever -
the default being that we abbreviate to 40).

               Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:02 ` Junio C Hamano
  2017-02-23 17:12   ` David Lang
  2017-02-23 17:18   ` Junio C Hamano
@ 2017-02-23 17:35   ` Joey Hess
  2017-02-23 17:52     ` Linus Torvalds
  2017-02-23 17:52     ` David Lang
  2017-02-23 19:20   ` David Lang
  3 siblings, 2 replies; 136+ messages in thread
From: Joey Hess @ 2017-02-23 17:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1602 bytes --]

Junio C Hamano wrote:
> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
> >
> > Since we now have collisions in valid PDF files, collisions in valid git
> > commit and tree objects are probably able to be constructed.
> 
> That may be true, but
> https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/

That's about someone replacing an valid object in Linus's repository
with an invalid random blob they found that collides. This SHA1
break doesn't allow generating such a blob anyway. Linus is right,
that's an impractical attack.

Attacks using this SHA1 break will look something more like:

* I push a "bad" object to a repo on github I set up under a
  pseudonym.
* I publish a "good" object in a commit and convince the maintainer to
  merge it.
* I wait for the maintainer to push to github.
* I wait for github to deduplicate and hope they'll replace the good
  object with the bad one I pre-uploaded, thus silently changing the
  content of the good commit the maintainer reviewed and pushed.
* The bad object is pulled from github and deployed.
* The maintainer still has the good object. They may not notice the bad
  object is out there for a long time.

Of course, it doesn't need to involve Github, and doesn't need to
rely on internal details of their deduplication[1]; 
that only let me publish the bad object under a psydonym.

-- 
see shy jo

[1] Which I'm only guessing about, but now that we have colliding
    objects, we can upload them to different repos and see if such
    dedupication happens.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:35   ` Joey Hess
@ 2017-02-23 17:52     ` Linus Torvalds
  2017-02-23 18:21       ` Joey Hess
  2017-02-23 17:52     ` David Lang
  1 sibling, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 17:52 UTC (permalink / raw)
  To: Joey Hess; +Cc: Junio C Hamano, Git Mailing List

On Thu, Feb 23, 2017 at 9:35 AM, Joey Hess <id@joeyh.name> wrote:
>
> Attacks using this SHA1 break will look something more like:

We don't actually know what the break is, but it's likely that you
can't actually do what you think you can do:

> * I push a "bad" object to a repo on github I set up under a
>   pseudonym.
> * I publish a "good" object in a commit and convince the maintainer to
>   merge it.

It's not clear that the "good" object can be anything sane.

What you describe pretty much already requires a pre-image attack,
which the new attack is _not_.

The new attack doesn't have a controlled "good" case, you need two
different objects that both have "near-collision" blocks in the
middle. I don't know what the format of those near-collision blocks
are, but it's a big problem.

You blithely just say "I create a good object". It's not that simple.
If it was, this would be a pre-image attack.

So basically, the attack needs some kind of random binary garbage in
*both* objects in the middle.

That's why pdf's are the classic model for showing these attacks: it's
easy to insert garbage in the middle of a pdf that is invisible.

In a psf, you can just define a bitmap that you don't use for printing
- but you can use them to then make a decision about what to print -
making the printed version of the pdf look radically different in ways
that are not so much _directly_ about the invisible block itself.

              Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:35   ` Joey Hess
  2017-02-23 17:52     ` Linus Torvalds
@ 2017-02-23 17:52     ` David Lang
  1 sibling, 0 replies; 136+ messages in thread
From: David Lang @ 2017-02-23 17:52 UTC (permalink / raw)
  To: Joey Hess; +Cc: Junio C Hamano, Git Mailing List

On Thu, 23 Feb 2017, Joey Hess wrote:

> Junio C Hamano wrote:
>> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>>>
>>> Since we now have collisions in valid PDF files, collisions in valid git
>>> commit and tree objects are probably able to be constructed.
>>
>> That may be true, but
>> https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/
>
> That's about someone replacing an valid object in Linus's repository
> with an invalid random blob they found that collides. This SHA1
> break doesn't allow generating such a blob anyway. Linus is right,
> that's an impractical attack.
>
> Attacks using this SHA1 break will look something more like:
>
> * I push a "bad" object to a repo on github I set up under a
>  pseudonym.
> * I publish a "good" object in a commit and convince the maintainer to
>  merge it.
> * I wait for the maintainer to push to github.
> * I wait for github to deduplicate and hope they'll replace the good
>  object with the bad one I pre-uploaded, thus silently changing the
>  content of the good commit the maintainer reviewed and pushed.
> * The bad object is pulled from github and deployed.
> * The maintainer still has the good object. They may not notice the bad
>  object is out there for a long time.
>
> Of course, it doesn't need to involve Github, and doesn't need to
> rely on internal details of their deduplication[1];
> that only let me publish the bad object under a psydonym.

read that e-mail again, it covers the case where a central server gets a blob 
replaced in it.

tricking a maintainerinto accepting a file that contains huge amounts of binary 
data in it is going to be a non-trivial task, and even after you trick them into 
accepting one bad file, you then need to replace the file they accepted with a 
new one (breaking into github or assuming that github is putting both files into 
the same repo, both of which are fairly unlikely)

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:19 ` Linus Torvalds
  2017-02-23 17:29   ` Linus Torvalds
@ 2017-02-23 18:10   ` Joey Hess
  2017-02-23 18:29     ` Linus Torvalds
  2017-02-23 18:38     ` Junio C Hamano
  1 sibling, 2 replies; 136+ messages in thread
From: Joey Hess @ 2017-02-23 18:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1699 bytes --]

Linus Torvalds wrote:
> I haven't seen the attack yet, but git doesn't actually just hash the
> data, it does prepend a type/length field to it. That usually tends to
> make collision attacks much harder, because you either have to make
> the resulting size the same too, or you have to be able to also edit
> the size field in the header.

I have some sha1 collisions (and other fun along these lines) in 
https://github.com/joeyh/supercollider

That includes two files with the same SHA and size, which do get
different blobs thanks to the way git prepends the header to the
content.

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65	bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1	good.pdf

While appending identical data to these colliding files does generate
other collisions, prepending data does not.

It would cost 6500 CPU years + 100 GPU years to generate valid colliding
git objects using the methods of the paper's authors. That might be cost
effective if it helped get a backdoor into eg, the kernel.

>  (b) we can probably easily add some extra sanity checks to the opaque
> data we do have, to make it much harder to do the hiding of random
> data that these attacks pretty much always depend on.

For example, git fsck does warn about a commit message with opaque
data hidden after a NUL. But, git show/merge/pull give no indication
that something funky is going on when working with such commits.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:52     ` Linus Torvalds
@ 2017-02-23 18:21       ` Joey Hess
  2017-02-23 18:31         ` Joey Hess
                           ` (2 more replies)
  0 siblings, 3 replies; 136+ messages in thread
From: Joey Hess @ 2017-02-23 18:21 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]

Linus Torvalds wrote:
> What you describe pretty much already requires a pre-image attack,
> which the new attack is _not_.
> 
> It's not clear that the "good" object can be anything sane.

Generate a regular commit object; use the entire commit object + NUL as the
chosen prefix, and use the identical-prefix collision attack to generate
the colliding good/bad objects.

(The size in git's object header is a minor complication. Set the size
field to something sufficiently large, and then pad out the colliding
objects to that size once they're generated.)

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:10   ` Joey Hess
@ 2017-02-23 18:29     ` Linus Torvalds
  2017-02-23 18:38     ` Junio C Hamano
  1 sibling, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 18:29 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 10:10 AM, Joey Hess <id@joeyh.name> wrote:
>
> It would cost 6500 CPU years + 100 GPU years to generate valid colliding
> git objects using the methods of the paper's authors. That might be cost
> effective if it helped get a backdoor into eg, the kernel.

I still think it also needs to be interesting enough data, not just
random noise that is then trivial to find with automated tools.

Because for the kernel, it's not just that an attacker needs to do the
CPU time. Yes, first he needs the technical resources to just do just
the attack and create the situation you described.

But then he *also* needs to build up the social capital to get the end
result pulled into the tree (ie if he depends on the hidden spaces, he
needs somebody to actually do a git pull, not just apply a patch).

.. and if we then have a tool that then finds the problem trivially
(ie "git fsck"), he's not only wasted all those technical resources,
he's also burned his identity.

>>  (b) we can probably easily add some extra sanity checks to the opaque
>> data we do have, to make it much harder to do the hiding of random
>> data that these attacks pretty much always depend on.
>
> For example, git fsck does warn about a commit message with opaque
> data hidden after a NUL. But, git show/merge/pull give no indication
> that something funky is going on when working with such commits.

I do agree that we might want to do some of the fsck checks
particularly at fetch time. That's when doing checks is both relevant
and cheap.

So we could do the opaque data checks, but we could/should probably
also add the attack pattern ("disturbance vectors") checks.

And the thing is, adding those checks is really cheap, and basically
makes the whole attack vector pointless against git.

Because unlike some "signing a pdf" attack, git doesn't fundamentally
depend on the SHA1 as some kind of absolute security.  If we have the
minimal machinery in git to just notice the attack, the attack
essentially goes away. Attackers can waste infinite amounts of CPU
time, and if it's cheap for us to notice, it completely disarms all
that attack work.

Again, I'm not arguing that people shouldn't work on extending git to
a new (and bigger) hash. I think that's a no-brainer, and we do want
to have a path to eventually move towards SHA3-256 or whatever.

But I'm very definitely arguing that the current attack doesn't
actually sound like it really even _matters_, because it should be so
easy to mitigate against.

                   Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:21       ` Joey Hess
@ 2017-02-23 18:31         ` Joey Hess
  2017-02-23 19:13           ` Morten Welinder
  2017-02-23 18:40         ` Linus Torvalds
  2017-02-23 18:42         ` Jeff King
  2 siblings, 1 reply; 136+ messages in thread
From: Joey Hess @ 2017-02-23 18:31 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

Joey Hess wrote:
> Linus Torvalds wrote:
> > What you describe pretty much already requires a pre-image attack,
> > which the new attack is _not_.
> > 
> > It's not clear that the "good" object can be anything sane.
> 
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.
> 
> (The size in git's object header is a minor complication. Set the size
> field to something sufficiently large, and then pad out the colliding
> objects to that size once they're generated.)

Sorry! While that would work, it's a useless attack because the good and bad
commit objects still point to the same tree.

It would be interesting to have such colliding objects, to see what beaks,
but probably not worth $75k to generate them.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:10   ` Joey Hess
  2017-02-23 18:29     ` Linus Torvalds
@ 2017-02-23 18:38     ` Junio C Hamano
  1 sibling, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-23 18:38 UTC (permalink / raw)
  To: Joey Hess; +Cc: Linus Torvalds, Git Mailing List

Joey Hess <id@joeyh.name> writes:

> For example, git fsck does warn about a commit message with opaque
> data hidden after a NUL. But, git show/merge/pull give no indication
> that something funky is going on when working with such commits.

Would

    $ git config transfer.fsckobjects true

help?

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:21       ` Joey Hess
  2017-02-23 18:31         ` Joey Hess
@ 2017-02-23 18:40         ` Linus Torvalds
  2017-02-23 18:46           ` Jeff King
  2017-02-23 18:42         ` Jeff King
  2 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 18:40 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 10:21 AM, Joey Hess <id@joeyh.name> wrote:
> Linus Torvalds wrote:
>> What you describe pretty much already requires a pre-image attack,
>> which the new attack is _not_.
>>
>> It's not clear that the "good" object can be anything sane.
>
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.

So I agree with you that we need to make git check for the opaque
data. I think I was the one who brought that whole argument up.

But even then, what you describe doesn't work. What you describe just
replaces the opaque data - that git doesn't actually *use*, and that
nobody sees - with another piece of opaque data.

You also need to make the non-opaque data of the bad object be
something that actually encodes valid git data with interesting hashes
in it (for the parent/tree/whatever pointers).

So you don't have just that "chosen prefix". You actually need to also
fill in some very specific piece of data *in* the attack parts itself.
And you need to do this in the exact same size (because that's part of
the prefix), etc etc.

So I think it's challenging.

... and then we can discover it trivially.

Ok, so "git fsck" right now takes a couple of minutes for me and I
don't actually run it very often (I used to run it religiously back in
the days), but afaik kernel.org actually runs it nightly. So it's
pretty much "trivially discoverable" - imagine spending thousands of
CPU-hours and lots of social capital to get an attack in, and then the
next night the kernel.org fsck complains about the strange commit you
added?

                  Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:21       ` Joey Hess
  2017-02-23 18:31         ` Joey Hess
  2017-02-23 18:40         ` Linus Torvalds
@ 2017-02-23 18:42         ` Jeff King
  2 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 18:42 UTC (permalink / raw)
  To: Joey Hess; +Cc: Linus Torvalds, Git Mailing List

On Thu, Feb 23, 2017 at 02:21:47PM -0400, Joey Hess wrote:

> Linus Torvalds wrote:
> > What you describe pretty much already requires a pre-image attack,
> > which the new attack is _not_.
> > 
> > It's not clear that the "good" object can be anything sane.
> 
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.

FWIW, git-fsck complains about those (and transfer.fsck rejects them):

  $ (git cat-file commit HEAD; printf '\0more stuff') |
    git hash-object -w --stdin -t commit
  ecb2e5165c184f9025cb4c49d8f75901f4830354

  $ git fsck
  warning in commit ecb2e5165c184f9025cb4c49d8f75901f4830354: nulInCommit: NUL byte in the commit object body

So as long as either your "good" or "evil" commit has binary junk in it,
you are likely to be noticed (not everybody turns on transfer.fsck, but
GitHub does).

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:40         ` Linus Torvalds
@ 2017-02-23 18:46           ` Jeff King
  2017-02-23 19:09             ` Linus Torvalds
  2017-02-23 20:46             ` Joey Hess
  0 siblings, 2 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 18:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 10:40:48AM -0800, Linus Torvalds wrote:

> > Generate a regular commit object; use the entire commit object + NUL as the
> > chosen prefix, and use the identical-prefix collision attack to generate
> > the colliding good/bad objects.
> 
> So I agree with you that we need to make git check for the opaque
> data. I think I was the one who brought that whole argument up.

We do already.

> But even then, what you describe doesn't work. What you describe just
> replaces the opaque data - that git doesn't actually *use*, and that
> nobody sees - with another piece of opaque data.
> 
> You also need to make the non-opaque data of the bad object be
> something that actually encodes valid git data with interesting hashes
> in it (for the parent/tree/whatever pointers).
> 
> So you don't have just that "chosen prefix". You actually need to also
> fill in some very specific piece of data *in* the attack parts itself.
> And you need to do this in the exact same size (because that's part of
> the prefix), etc etc.

It's not an identical prefix, but I think collision attacks generally
are along the lines of selecting two prefixes followed by garbage, and
then mutating the garbage on both sides. That would "work" in this case
(modulo the fact that git would complain about the NUL).

I haven't read the paper yet to see if that is the case here, though.

A related case is if you could stick a "cruft ...." header at the end of
the commit headers, and mutate its value (avoiding newlines). fsck
doesn't complain about that.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:46           ` Jeff King
@ 2017-02-23 19:09             ` Linus Torvalds
  2017-02-23 19:32               ` Jeff King
  2017-02-23 20:47               ` Øyvind A. Holm
  2017-02-23 20:46             ` Joey Hess
  1 sibling, 2 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 19:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 10:46 AM, Jeff King <peff@peff.net> wrote:
>>
>> So I agree with you that we need to make git check for the opaque
>> data. I think I was the one who brought that whole argument up.
>
> We do already.

I'm aware of the fsck checks, but I have to admit I wasn't aware of
'transfer.fsckobjects'. I should turn that on myself.

Or maybe git should just turn it on by default? At least the
per-object fsck costs should be essentially free compared to the
network costs when you just apply them to the incoming objects.

I also do think that it would be good to check for the disturbance
vectors at receive time (and fsck). Not necessarily interesting during
normal operations.

And in particular, while the *kernel* doesn't generally have critical
opaque blobs, other projects do. Things like firmware images etc are
open to attack, and crazy people put ISO images in repositories etc.

So I don't think this discussion should focus exclusively on the git metadata.

It is likely much easier to replace a binary blob than it is to
replace a commit or tree (or a source file that has to go through a
compiler). And for many projects, that would be a bad thing.

> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).

I think this particular attack depended on an actual identical prefix,
but I didn't go back to the paper and check.

But the attacks tend to very much depend on particular input bit
patterns that have very particular effects on the resulting
intermediate hash, and those bit patterns are specific to the hash and
known.

So a very powerful defense is to just look for those bit patterns in
the objects, and just warn about them. Those patterns don't tend to
exist in normal inputs anyway, but particularly if you just warn, it's
a heads-ups that "ok, something iffy is going on"

And as mentioned, a cheap "something iffy is going on" thing is
basically a death sentence to SCM attacks.

The whole _point_ of an SCM is that it isn't about a one-time event,
but about continuous history. That also fundamentally means that a
successful attack needs to work over time, and not be detectable.

In contrast, many other uses of hashes are "one-time" events.  If you
use a hash to validate a single piece of data from a source that you
wouldn't otherwise trust, it's a one-time "all or nothing" trust
situation.

And the attack surface is very different for those "one-time" vs
"trust over time" cases. If you can get a bank to trust a session one
time, you can empty a bank account and live on a paradise island for
the rest of your life. It doesn't matter if it gets detected or not
after-the-fact.

But if you can fool a SCM one time, insert your code, and it gets
detected next week, you didn't actually do anything useful. You only
burned yourself.

See the difference? One-time vs having a continual interaction makes a
*fundamntal* difference in game theory.

                Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:31         ` Joey Hess
@ 2017-02-23 19:13           ` Morten Welinder
  2017-02-24 15:52             ` Geert Uytterhoeven
  0 siblings, 1 reply; 136+ messages in thread
From: Morten Welinder @ 2017-02-23 19:13 UTC (permalink / raw)
  To: Joey Hess; +Cc: Linus Torvalds, Git Mailing List

The attack seems to generate two 64-bytes blocks, one quarter of which
is repeated data.  (Table-1 in the paper.)

Assuming the result of that is evenly distributed and that bytes are
independent, we can estimate the chances that the result is NUL-free
as (255/256)^192 = 47% and the probability that the result is NUL and
newline free as (254/256)^192 = 22%.  Clearly one should not rely of
NULs or newlines to save the day.  On  the other hand, the chances of
an ascii result is something like (95/256)^192 = 10^-83.

The actual collision in the paper has no newline, but it does have a NUL.

M.

On Thu, Feb 23, 2017 at 1:31 PM, Joey Hess <id@joeyh.name> wrote:
> Joey Hess wrote:
>> Linus Torvalds wrote:
>> > What you describe pretty much already requires a pre-image attack,
>> > which the new attack is _not_.
>> >
>> > It's not clear that the "good" object can be anything sane.
>>
>> Generate a regular commit object; use the entire commit object + NUL as the
>> chosen prefix, and use the identical-prefix collision attack to generate
>> the colliding good/bad objects.
>>
>> (The size in git's object header is a minor complication. Set the size
>> field to something sufficiently large, and then pad out the colliding
>> objects to that size once they're generated.)
>
> Sorry! While that would work, it's a useless attack because the good and bad
> commit objects still point to the same tree.
>
> It would be interesting to have such colliding objects, to see what beaks,
> but probably not worth $75k to generate them.
>
> --
> see shy jo

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:02 ` Junio C Hamano
                     ` (2 preceding siblings ...)
  2017-02-23 17:35   ` Joey Hess
@ 2017-02-23 19:20   ` David Lang
  3 siblings, 0 replies; 136+ messages in thread
From: David Lang @ 2017-02-23 19:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joey Hess, Git Mailing List


pointers to a little more info


https://shattered.it/static/
the two files are:

https://shattered.it/static/shattered-1.pdf
https://shattered.it/static/shattered-2.pdf

422435 shattered-2.pdf
422435 shattered-1.pdf

identical length and a lot smaller than I expected (~162K of the 413K file is 
binary junk)


$ sha1sum shattered-*pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

$ sum shattered-*pdf
62721   413 shattered-1.pdf
41606   413 shattered-2.pdf

$ md5sum shattered-*pdf
ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf
5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 19:09             ` Linus Torvalds
@ 2017-02-23 19:32               ` Jeff King
  2017-02-23 19:47                 ` Linus Torvalds
  2017-02-23 20:47               ` Øyvind A. Holm
  1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-23 19:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 11:09:32AM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 10:46 AM, Jeff King <peff@peff.net> wrote:
> >>
> >> So I agree with you that we need to make git check for the opaque
> >> data. I think I was the one who brought that whole argument up.
> >
> > We do already.
> 
> I'm aware of the fsck checks, but I have to admit I wasn't aware of
> 'transfer.fsckobjects'. I should turn that on myself.
> 
> Or maybe git should just turn it on by default? At least the
> per-object fsck costs should be essentially free compared to the
> network costs when you just apply them to the incoming objects.

Yeah, they're not expensive. We've discussed enabling them by default.
The sticking point is that there is old history with minor bugs which
triggers some warnings (e.g., malformed committer names), and it would
be annoying to start rejecting that unconditionally.

So I think we would need a good review of what is a "warning" versus an
"error", and to only reject on errors (right now the NUL thing is a
warning, and it should probably upgraded).

> And in particular, while the *kernel* doesn't generally have critical
> opaque blobs, other projects do. Things like firmware images etc are
> open to attack, and crazy people put ISO images in repositories etc.
> 
> So I don't think this discussion should focus exclusively on the git metadata.
> 
> It is likely much easier to replace a binary blob than it is to
> replace a commit or tree (or a source file that has to go through a
> compiler). And for many projects, that would be a bad thing.

Yes, I'd agree we need to consider both. And no matter what Git does in
its own data formats, blobs will always be a sequence of bytes. Hiding
collision-cruft in them isn't up to us, but rather the data format.

The nice thing about a blob collision, though, is that you can only
replace the opaque files, not, say, C source code. That doesn't make it
a non-issue, but it reduces the scope of an attack.

Replacing a commit or tree wholesale means the attacker has a lot more
flexibility. So to whatever degree we can make that harder (like
complaining of commits with NULs), the better.

> > It's not an identical prefix, but I think collision attacks generally
> > are along the lines of selecting two prefixes followed by garbage, and
> > then mutating the garbage on both sides. That would "work" in this case
> > (modulo the fact that git would complain about the NUL).
> 
> I think this particular attack depended on an actual identical prefix,
> but I didn't go back to the paper and check.

The paper describes the content as:

  SHA-1(P | M1 | M2 | S)

and they replace both "M1" and "M2", with a near-collision for the
first, and then the final collision for the second. What's not clear to
me is if part of M1 can be chosen, or if it's perturbed fully into
random garbage.

> But the attacks tend to very much depend on particular input bit
> patterns that have very particular effects on the resulting
> intermediate hash, and those bit patterns are specific to the hash and
> known.
> 
> So a very powerful defense is to just look for those bit patterns in
> the objects, and just warn about them. Those patterns don't tend to
> exist in normal inputs anyway, but particularly if you just warn, it's
> a heads-ups that "ok, something iffy is going on"

Yes, that would be a wonderful hardening to put into Git if we know what
those patterns look like. That part isn't clear to me.

> The whole _point_ of an SCM is that it isn't about a one-time event,
> but about continuous history. That also fundamentally means that a
> successful attack needs to work over time, and not be detectable.

Yeah, I'd certainly agree with that. You spend loads of money to
generate a collision, there's a reasonably high chance of detection, and
then as soon as one person detects it, your investment is lost.

According to the paper, the current cost of the computation for a single
collision is ~$670K.

At least for now, an attacker is much better off using that money to
break into your house and install a keylogger.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 19:32               ` Jeff King
@ 2017-02-23 19:47                 ` Linus Torvalds
  2017-02-23 19:57                   ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 19:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 11:32 AM, Jeff King <peff@peff.net> wrote:
>
> Yeah, they're not expensive. We've discussed enabling them by default.
> The sticking point is that there is old history with minor bugs which
> triggers some warnings (e.g., malformed committer names), and it would
> be annoying to start rejecting that unconditionally.
>
> So I think we would need a good review of what is a "warning" versus an
> "error", and to only reject on errors (right now the NUL thing is a
> warning, and it should probably upgraded).

I think even a warning (as opposed to failing the operation) is
already a big deal.

If people start saying "why do I get this odd warning", and start
looking into it, that's going to be a pretty strong defense against
bad behavior. SCM attacks depend on flying under the radar.

>> So a very powerful defense is to just look for those bit patterns in
>> the objects, and just warn about them. Those patterns don't tend to
>> exist in normal inputs anyway, but particularly if you just warn, it's
>> a heads-ups that "ok, something iffy is going on"
>
> Yes, that would be a wonderful hardening to put into Git if we know what
> those patterns look like. That part isn't clear to me.

There's actually already code for that, pointed to by the shattered project:

  https://github.com/cr-marcstevens/sha1collisiondetection

the "meat" of that check is in lib/ubc_check.c.

                  Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 19:47                 ` Linus Torvalds
@ 2017-02-23 19:57                   ` Jeff King
       [not found]                     ` <alpine.LFD.2.20.1702231428540.30435@i7.lan>
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-23 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 11:47:16AM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 11:32 AM, Jeff King <peff@peff.net> wrote:
> >
> > Yeah, they're not expensive. We've discussed enabling them by default.
> > The sticking point is that there is old history with minor bugs which
> > triggers some warnings (e.g., malformed committer names), and it would
> > be annoying to start rejecting that unconditionally.
> >
> > So I think we would need a good review of what is a "warning" versus an
> > "error", and to only reject on errors (right now the NUL thing is a
> > warning, and it should probably upgraded).
> 
> I think even a warning (as opposed to failing the operation) is
> already a big deal.
> 
> If people start saying "why do I get this odd warning", and start
> looking into it, that's going to be a pretty strong defense against
> bad behavior. SCM attacks depend on flying under the radar.

Sorry, I conflated two things there. I agree a warning is better than
nothing. But right now transfer.fsck croaks even for warnings, and there
are some warnings that it is not worth croaking for. So before we turn
it on, we need to stop croaking on warnings (and possibly bump up some
warnings to errors).

I think it _is_ important to have dangerous things as errors, though.
Because it helps an unattended server (where nobody would see the
warning) avoid being a vector for spreading malicious objects to older
clients which do not do the fsck.

> There's actually already code for that, pointed to by the shattered project:
> 
>   https://github.com/cr-marcstevens/sha1collisiondetection
> 
> the "meat" of that check is in lib/ubc_check.c.

Thanks, I hadn't seen that yet. That doesn't look like it should be hard
to integrate into Git.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 18:46           ` Jeff King
  2017-02-23 19:09             ` Linus Torvalds
@ 2017-02-23 20:46             ` Joey Hess
  1 sibling, 0 replies; 136+ messages in thread
From: Joey Hess @ 2017-02-23 20:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

Jeff King wrote:
> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).
> 
> I haven't read the paper yet to see if that is the case here, though.

The current attack is an identical-prefix attack, not chosen-prefix, so
not quite to that point yet.

The MD5 chosen-prefix attack was 2^15 harder than the known-prefix attack,
but who knows if the numbers will be comprable for SHA1.

> A related case is if you could stick a "cruft ...." header at the end of
> the commit headers, and mutate its value (avoiding newlines). fsck
> doesn't complain about that.

git log and git show don't show such cruft headers either.

BTW, the SHA attack only added ~128 bytes to the pdfs, not really a
huge amount of garbage.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 19:09             ` Linus Torvalds
  2017-02-23 19:32               ` Jeff King
@ 2017-02-23 20:47               ` Øyvind A. Holm
  1 sibling, 0 replies; 136+ messages in thread
From: Øyvind A. Holm @ 2017-02-23 20:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Joey Hess, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On 2017-02-23 11:09:32, Linus Torvalds wrote:
> I'm aware of the fsck checks, but I have to admit I wasn't aware of 
> 'transfer.fsckobjects'. I should turn that on myself.
>
> Or maybe git should just turn it on by default?

The problem with this is that there are many repos with errors out 
there, for example coreutils.git and nasm.git, which complains about 
"missingSpaceBeforeDate: invalid author/committer line - missing space 
before date".

There are also lots of repositories bitten by the Github bug from back 
in 2011 where they zero-padded the file modes, git clone aborts with 
"zeroPaddedFilemode: contains zero-padded file modes".

Paranoid as I am, I'm using fetch.fsckObjects and receive.fsckObjects 
set to "true", but that means I'm not able to clone repositories with 
these kind of errors, have to use the alias

  fclone = clone -c "fetch.fsckObjects=false"

So enabling them by default will create problems among users. Of course, 
one solution would be to turn these kind of errors into warnings so the 
clone isn't aborted.

Reagards,
Øyvind

+-| Øyvind A. Holm <sunny@sunbase.org> - N 60.37604° E 5.33339° |-+
| OpenPGP: 0xFB0CBEE894A506E5 - http://www.sunbase.org/pubkey.asc |
| Fingerprint: A006 05D6 E676 B319 55E2  E77E FB0C BEE8 94A5 06E5 |
+------------| c7e47a18-fa06-11e6-ad93-db5caa6d21d3 |-------------+

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 17:12   ` David Lang
@ 2017-02-23 20:49     ` Jakub Narębski
  2017-02-23 20:57       ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Jakub Narębski @ 2017-02-23 20:49 UTC (permalink / raw)
  To: David Lang, Junio C Hamano; +Cc: Joey Hess, Git Mailing List

W dniu 23.02.2017 o 18:12, David Lang pisze:
> On Thu, 23 Feb 2017, Junio C Hamano wrote:
> 
>> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>>> 
>>> Since we now have collisions in valid PDF files, collisions in
>>> valid git commit and tree objects are probably able to be
>>> constructed.
>> 
>> That may be true, but 
>> https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/
>>
>
> it doesn't help that the Google page on this explicitly says that
> this shows that it's possible to create two different git repos that
> have the same hash but different contents.
> 
> https://shattered.it/
> 
> How is GIT affected? GIT strongly relies on SHA-1 for the
> identification and integrity checking of all file objects and
> commits. It is essentially possible to create two GIT repositories
> with the same head commit hash and different contents, say a benign
> source code and a backdoored one. An attacker could potentially
> selectively serve either repository to targeted users. This will
> require attackers to compute their own collision.

The attack on SHA-1 presented there is "identical-prefix" collision,
which is less powerful than "chosen-prefix" collision.  It is the
latter that is required to defeat SHA-1 used in object identity.
Objects in Git _must_ begin with given prefix; the use of zlib
compression adds to the difficulty.  'Forged' Git object would
simply not validate...

https://arstechnica.com/security/2017/02/at-deaths-door-for-years-widely-used-sha1-function-is-now-dead/


^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 20:49     ` Jakub Narębski
@ 2017-02-23 20:57       ` Jeff King
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 20:57 UTC (permalink / raw)
  To: Jakub Narębski
  Cc: David Lang, Junio C Hamano, Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 09:49:09PM +0100, Jakub Narębski wrote:

> > How is GIT affected? GIT strongly relies on SHA-1 for the
> > identification and integrity checking of all file objects and
> > commits. It is essentially possible to create two GIT repositories
> > with the same head commit hash and different contents, say a benign
> > source code and a backdoored one. An attacker could potentially
> > selectively serve either repository to targeted users. This will
> > require attackers to compute their own collision.
> 
> The attack on SHA-1 presented there is "identical-prefix" collision,
> which is less powerful than "chosen-prefix" collision.  It is the
> latter that is required to defeat SHA-1 used in object identity.
> Objects in Git _must_ begin with given prefix;

I don't think this helps. The chosen-prefix lets you append hash data to
an existing file. Here we just have identical prefixes in the two
colliding halves. In the real-world example, they used a PDF header. But
it could have been a PDF header with "blob 1234" prepended to it (note
also that Git's use of the size doesn't help; the attack files are the
same length).

> the use of zlib
> compression adds to the difficulty.  'Forged' Git object would
> simply not validate...

No, zlib doesn't help. The sha1 is computed on the uncompressed data.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
       [not found]                     ` <alpine.LFD.2.20.1702231428540.30435@i7.lan>
@ 2017-02-23 22:43                       ` Jeff King
  2017-02-23 22:50                         ` Linus Torvalds
  2017-02-23 23:05                         ` Jeff King
  0 siblings, 2 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 22:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote:

> > Thanks, I hadn't seen that yet. That doesn't look like it should be hard
> > to integrate into Git.
> 
> Here's a *very* ugly patch that is absolutely disgusting and should not be 
> used. But it does kind of work (I tested it with a faked-up extra patch 
> that made git accept the broken pdf as a loose object).
> 
> What do I mean by "kind of work"? It uses that ugly and slow checking 
> SHA1 routine from the collision detection project for the SHA1 object 
> verification, and it means that "git fsck" ends up being about twice as 
> slow as it used to be.

Heh. I was just putting the finishing touches on a similar patch. Mine
is much less gross, in that it actually just adds a new USE_SHA1DC knob
(instead of, say, BLK_SHA1).

Here are the timings I came up with:

  - compute sha1 over whole packfile
    before: 1.349s
     after: 5.067s
    change: +275%

  - rev-list --all
    before: 5.742s
     after: 5.730s
    change: -0.2%

  - rev-list --all --objects
    before: 33.257s
     after: 33.392s
    change: +0.4%

  - index-pack --verify
    before: 2m20s
     after: 5m43s
    change: +145%

  - git log --no-merges -10000 -p
    before: 9.532s
     after: 9.683s
    change: +1.5%

So overall the sha1 computation is about 3-4x slower. But of
course most operations do more than just sha1. Accessing
commits and trees isn't slowed at all (both the +/- changes
there are well within the run-to-run noise). Accessing the
blobs is a little slower, but mostly drowned out by the cost
of things like actually generating patches.

The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. It's
a bit worse than twice as slow, which means every push and
every fetch is going to experience that.

> For example, I suspect we could use our (much cleaner) block-sha1 
> implementation and include just the ubc_check.c code with that, instead of 
> the truly ugly C sha1 implementation that the sha1collisiondetection 
> project uses. 
> 
> But to do that, somebody would have to really know how the unavoidable 
> bit conditions check works with the intermediate hashes. I have only a 
> "big picture" mental model of it (read: I'm not competent to do that).

Yeah. I started looking at that, but the ubc check happens after the
initial expansion. But AFAICT, block-sha1 mixes that expansion in with
the rest of the steps for efficiency. So perhaps somebody who really
understands sha1 and the new checks could figure it out, but I'm not at
all certain that adding it in wouldn't lose some of block-sha1's
efficiency (on top of the time to actually do the ubc check).

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 22:43                       ` Jeff King
@ 2017-02-23 22:50                         ` Linus Torvalds
  2017-02-23 23:05                         ` Jeff King
  1 sibling, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 22:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 2:43 PM, Jeff King <peff@peff.net> wrote:
>
> Yeah. I started looking at that, but the ubc check happens after the
> initial expansion.

Yes. That's the point where I gave up and just included their ugly sha1.c file.

I suspect it can be done, but it would need somebody to really know
what they are doing.

            Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 22:43                       ` Jeff King
  2017-02-23 22:50                         ` Linus Torvalds
@ 2017-02-23 23:05                         ` Jeff King
  2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
                                             ` (4 more replies)
  1 sibling, 5 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 23:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 05:43:02PM -0500, Jeff King wrote:

> On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote:
> 
> > > Thanks, I hadn't seen that yet. That doesn't look like it should be hard
> > > to integrate into Git.
> > 
> > Here's a *very* ugly patch that is absolutely disgusting and should not be 
> > used. But it does kind of work (I tested it with a faked-up extra patch 
> > that made git accept the broken pdf as a loose object).
> > 
> > What do I mean by "kind of work"? It uses that ugly and slow checking 
> > SHA1 routine from the collision detection project for the SHA1 object 
> > verification, and it means that "git fsck" ends up being about twice as 
> > slow as it used to be.
> 
> Heh. I was just putting the finishing touches on a similar patch. Mine
> is much less gross, in that it actually just adds a new USE_SHA1DC knob
> (instead of, say, BLK_SHA1).

Here's my patches. They _might_ be worth including if only because they
shouldn't bother anybody unless they enable USE_SHA1DC. So it makes it a
bit more accessible for people to experiment with (or be paranoid with
if they like).

The first one is 98K. Mail headers may bump it over vger's 100K barrier.
It's actually the _least_ interesting patch of the 3, because it just
imports the code wholesale from the other project. But if it doesn't
make it, you can fetch the whole series from:

  https://github.com/peff/git jk/sha1dc

(By the way, I don't see your version on the list, Linus, which probably
means it was eaten by the 100K filter).

  [1/3]: add collision-detecting sha1 implementation
  [2/3]: sha1dc: adjust header includes for git
  [3/3]: Makefile: add USE_SHA1DC knob

 Makefile           |   10 +
 sha1dc/sha1.c      | 1165 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 sha1dc/sha1.h      |  108 +++++
 sha1dc/ubc_check.c |  361 ++++++++++++++++
 sha1dc/ubc_check.h |   33 ++
 5 files changed, 1677 insertions(+)
 create mode 100644 sha1dc/sha1.c
 create mode 100644 sha1dc/sha1.h
 create mode 100644 sha1dc/ubc_check.c
 create mode 100644 sha1dc/ubc_check.h

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* [PATCH 1/3] add collision-detecting sha1 implementation
  2017-02-23 23:05                         ` Jeff King
@ 2017-02-23 23:05                           ` Jeff King
  2017-02-23 23:15                             ` Stefan Beller
  2017-02-23 23:05                           ` [PATCH 2/3] sha1dc: adjust header includes for git Jeff King
                                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-23 23:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

This is pulled straight from:

  https://github.com/cr-marcstevens/sha1collisiondetection

with no modifications yet (though I've pulled in only the
subset of files necessary for Git to use).

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1dc/sha1.c      | 1146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 sha1dc/sha1.h      |   94 +++++
 sha1dc/ubc_check.c |  361 +++++++++++++++++
 sha1dc/ubc_check.h |   35 ++
 4 files changed, 1636 insertions(+)
 create mode 100644 sha1dc/sha1.c
 create mode 100644 sha1dc/sha1.h
 create mode 100644 sha1dc/ubc_check.c
 create mode 100644 sha1dc/ubc_check.h

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
new file mode 100644
index 000000000..ed2010911
--- /dev/null
+++ b/sha1dc/sha1.c
@@ -0,0 +1,1146 @@
+/***
+* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow (danshu@microsoft.com) 
+* Distributed under the MIT Software License.
+* See accompanying file LICENSE.txt or copy at
+* https://opensource.org/licenses/MIT
+***/
+
+#include <string.h>
+#include <memory.h>
+#include <stdio.h>
+
+#include "sha1.h"
+#include "ubc_check.h"
+
+#define rotate_right(x,n) (((x)>>(n))|((x)<<(32-(n))))
+#define rotate_left(x,n)  (((x)<<(n))|((x)>>(32-(n))))
+
+#define sha1_f1(b,c,d) ((d)^((b)&((c)^(d))))
+#define sha1_f2(b,c,d) ((b)^(c)^(d))
+#define sha1_f3(b,c,d) (((b) & ((c)|(d))) | ((c)&(d)))
+#define sha1_f4(b,c,d) ((b)^(c)^(d))
+
+#define HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, m, t) \
+	{ e += rotate_left(a, 5) + sha1_f1(b,c,d) + 0x5A827999 + m[t]; b = rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, m, t) \
+	{ e += rotate_left(a, 5) + sha1_f2(b,c,d) + 0x6ED9EBA1 + m[t]; b = rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, m, t) \
+	{ e += rotate_left(a, 5) + sha1_f3(b,c,d) + 0x8F1BBCDC + m[t]; b = rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, m, t) \
+	{ e += rotate_left(a, 5) + sha1_f4(b,c,d) + 0xCA62C1D6 + m[t]; b = rotate_left(b, 30); }
+
+#define HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, m, t) \
+	{ b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f1(b,c,d) + 0x5A827999 + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, m, t) \
+	{ b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f2(b,c,d) + 0x6ED9EBA1 + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, m, t) \
+	{ b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f3(b,c,d) + 0x8F1BBCDC + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, m, t) \
+	{ b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f4(b,c,d) + 0xCA62C1D6 + m[t]; }
+
+#define SHA1_STORE_STATE(i) states[i][0] = a; states[i][1] = b; states[i][2] = c; states[i][3] = d; states[i][4] = e;
+
+
+
+void sha1_message_expansion(uint32_t W[80])
+{
+	for (unsigned i = 16; i < 80; ++i)
+		W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
+}
+
+void sha1_compression(uint32_t ihv[5], const uint32_t m[16])
+{
+	uint32_t W[80];
+
+	memcpy(W, m, 16 * 4);
+	for (unsigned i = 16; i < 80; ++i)
+		W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
+
+	uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
+
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
+
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 25);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 26);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 27);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 28);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 29);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 30);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 31);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 32);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 33);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 34);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 35);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 36);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 37);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 38);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 39);
+
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 40);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 41);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 42);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 43);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 44);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 45);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 46);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 47);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 48);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 49);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 50);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 51);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 52);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 53);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 54);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 55);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 56);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 57);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 58);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 59);
+
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 60);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 61);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 62);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 63);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 64);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 65);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 66);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 67);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 68);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 69);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 70);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 71);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 72);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 73);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 74);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 75);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 76);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 77);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 78);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 79);
+
+	ihv[0] += a; ihv[1] += b; ihv[2] += c; ihv[3] += d; ihv[4] += e;
+}
+
+
+
+void sha1_compression_W(uint32_t ihv[5], const uint32_t W[80])
+{
+	uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
+
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
+
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 25);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 26);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 27);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 28);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 29);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 30);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 31);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 32);
+ 	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 33);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 34);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 35);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 36);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 37);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 38);
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 39);
+
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 40);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 41);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 42);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 43);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 44);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 45);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 46);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 47);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 48);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 49);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 50);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 51);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 52);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 53);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 54);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 55);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 56);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 57);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 58);
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 59);
+
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 60);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 61);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 62);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 63);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 64);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 65);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 66);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 67);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 68);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 69);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 70);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 71);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 72);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 73);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 74);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 75);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 76);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 77);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 78);
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 79);
+
+	ihv[0] += a; ihv[1] += b; ihv[2] += c; ihv[3] += d; ihv[4] += e;
+}
+
+
+
+void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t states[80][5])
+{
+	uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
+
+#ifdef DOSTORESTATE00
+	SHA1_STORE_STATE(0)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
+
+#ifdef DOSTORESTATE01
+	SHA1_STORE_STATE(1)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
+
+#ifdef DOSTORESTATE02
+	SHA1_STORE_STATE(2)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
+
+#ifdef DOSTORESTATE03
+	SHA1_STORE_STATE(3)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
+
+#ifdef DOSTORESTATE04
+	SHA1_STORE_STATE(4)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
+
+#ifdef DOSTORESTATE05
+	SHA1_STORE_STATE(5)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
+
+#ifdef DOSTORESTATE06
+	SHA1_STORE_STATE(6)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
+
+#ifdef DOSTORESTATE07
+	SHA1_STORE_STATE(7)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
+
+#ifdef DOSTORESTATE08
+	SHA1_STORE_STATE(8)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
+
+#ifdef DOSTORESTATE09
+	SHA1_STORE_STATE(9)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
+
+#ifdef DOSTORESTATE10
+	SHA1_STORE_STATE(10)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
+
+#ifdef DOSTORESTATE11
+	SHA1_STORE_STATE(11)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
+
+#ifdef DOSTORESTATE12
+	SHA1_STORE_STATE(12)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
+
+#ifdef DOSTORESTATE13
+	SHA1_STORE_STATE(13)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
+
+#ifdef DOSTORESTATE14
+	SHA1_STORE_STATE(14)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
+
+#ifdef DOSTORESTATE15
+	SHA1_STORE_STATE(15)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
+
+#ifdef DOSTORESTATE16
+	SHA1_STORE_STATE(16)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
+
+#ifdef DOSTORESTATE17
+	SHA1_STORE_STATE(17)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
+
+#ifdef DOSTORESTATE18
+	SHA1_STORE_STATE(18)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
+
+#ifdef DOSTORESTATE19
+	SHA1_STORE_STATE(19)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
+
+
+
+#ifdef DOSTORESTATE20
+	SHA1_STORE_STATE(20)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
+
+#ifdef DOSTORESTATE21
+	SHA1_STORE_STATE(21)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
+	
+#ifdef DOSTORESTATE22
+	SHA1_STORE_STATE(22)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
+	
+#ifdef DOSTORESTATE23
+	SHA1_STORE_STATE(23)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
+
+#ifdef DOSTORESTATE24
+	SHA1_STORE_STATE(24)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
+
+#ifdef DOSTORESTATE25
+	SHA1_STORE_STATE(25)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 25);
+
+#ifdef DOSTORESTATE26
+	SHA1_STORE_STATE(26)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 26);
+
+#ifdef DOSTORESTATE27
+	SHA1_STORE_STATE(27)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 27);
+	
+#ifdef DOSTORESTATE28
+	SHA1_STORE_STATE(28)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 28);
+	
+#ifdef DOSTORESTATE29
+	SHA1_STORE_STATE(29)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 29);
+	
+#ifdef DOSTORESTATE30
+	SHA1_STORE_STATE(30)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 30);
+	
+#ifdef DOSTORESTATE31
+	SHA1_STORE_STATE(31)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 31);
+	
+#ifdef DOSTORESTATE32
+	SHA1_STORE_STATE(32)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 32);
+
+#ifdef DOSTORESTATE33
+	SHA1_STORE_STATE(33)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 33);
+
+#ifdef DOSTORESTATE34
+	SHA1_STORE_STATE(34)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 34);
+
+#ifdef DOSTORESTATE35
+	SHA1_STORE_STATE(35)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 35);
+	
+#ifdef DOSTORESTATE36
+	SHA1_STORE_STATE(36)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 36);
+	
+#ifdef DOSTORESTATE37
+	SHA1_STORE_STATE(37)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 37);
+	
+#ifdef DOSTORESTATE38
+	SHA1_STORE_STATE(38)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 38);
+	
+#ifdef DOSTORESTATE39
+	SHA1_STORE_STATE(39)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 39);
+
+
+
+#ifdef DOSTORESTATE40
+	SHA1_STORE_STATE(40)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 40);
+
+#ifdef DOSTORESTATE41
+	SHA1_STORE_STATE(41)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 41);
+
+#ifdef DOSTORESTATE42
+	SHA1_STORE_STATE(42)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 42);
+
+#ifdef DOSTORESTATE43
+	SHA1_STORE_STATE(43)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 43);
+
+#ifdef DOSTORESTATE44
+	SHA1_STORE_STATE(44)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 44);
+
+#ifdef DOSTORESTATE45
+	SHA1_STORE_STATE(45)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 45);
+
+#ifdef DOSTORESTATE46
+	SHA1_STORE_STATE(46)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 46);
+
+#ifdef DOSTORESTATE47
+	SHA1_STORE_STATE(47)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 47);
+
+#ifdef DOSTORESTATE48
+	SHA1_STORE_STATE(48)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 48);
+
+#ifdef DOSTORESTATE49
+	SHA1_STORE_STATE(49)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 49);
+
+#ifdef DOSTORESTATE50
+	SHA1_STORE_STATE(50)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 50);
+
+#ifdef DOSTORESTATE51
+	SHA1_STORE_STATE(51)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 51);
+
+#ifdef DOSTORESTATE52
+	SHA1_STORE_STATE(52)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 52);
+
+#ifdef DOSTORESTATE53
+	SHA1_STORE_STATE(53)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 53);
+
+#ifdef DOSTORESTATE54
+	SHA1_STORE_STATE(54)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 54);
+
+#ifdef DOSTORESTATE55
+	SHA1_STORE_STATE(55)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, W, 55);
+
+#ifdef DOSTORESTATE56
+	SHA1_STORE_STATE(56)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, W, 56);
+
+#ifdef DOSTORESTATE57
+	SHA1_STORE_STATE(57)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, W, 57);
+
+#ifdef DOSTORESTATE58
+	SHA1_STORE_STATE(58)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, W, 58);
+
+#ifdef DOSTORESTATE59
+	SHA1_STORE_STATE(59)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, W, 59);
+	
+
+
+
+#ifdef DOSTORESTATE60
+	SHA1_STORE_STATE(60)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 60);
+
+#ifdef DOSTORESTATE61
+	SHA1_STORE_STATE(61)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 61);
+
+#ifdef DOSTORESTATE62
+	SHA1_STORE_STATE(62)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 62);
+
+#ifdef DOSTORESTATE63
+	SHA1_STORE_STATE(63)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 63);
+
+#ifdef DOSTORESTATE64
+	SHA1_STORE_STATE(64)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 64);
+
+#ifdef DOSTORESTATE65
+	SHA1_STORE_STATE(65)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 65);
+
+#ifdef DOSTORESTATE66
+	SHA1_STORE_STATE(66)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 66);
+
+#ifdef DOSTORESTATE67
+	SHA1_STORE_STATE(67)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 67);
+
+#ifdef DOSTORESTATE68
+	SHA1_STORE_STATE(68)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 68);
+
+#ifdef DOSTORESTATE69
+	SHA1_STORE_STATE(69)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 69);
+
+#ifdef DOSTORESTATE70
+	SHA1_STORE_STATE(70)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 70);
+
+#ifdef DOSTORESTATE71
+	SHA1_STORE_STATE(71)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 71);
+
+#ifdef DOSTORESTATE72
+	SHA1_STORE_STATE(72)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 72);
+
+#ifdef DOSTORESTATE73
+	SHA1_STORE_STATE(73)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 73);
+
+#ifdef DOSTORESTATE74
+	SHA1_STORE_STATE(74)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 74);
+
+#ifdef DOSTORESTATE75
+	SHA1_STORE_STATE(75)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, W, 75);
+
+#ifdef DOSTORESTATE76
+	SHA1_STORE_STATE(76)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, W, 76);
+
+#ifdef DOSTORESTATE77
+	SHA1_STORE_STATE(77)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, W, 77);
+
+#ifdef DOSTORESTATE78
+	SHA1_STORE_STATE(78)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, W, 78);
+
+#ifdef DOSTORESTATE79
+	SHA1_STORE_STATE(79)
+#endif
+	HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, W, 79);
+
+
+
+	ihv[0] += a; ihv[1] += b; ihv[2] += c; ihv[3] += d; ihv[4] += e;
+}
+
+
+
+
+#define SHA1_RECOMPRESS(t) \
+void sha1recompress_fast_ ## t (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5]) \
+{ \
+	uint32_t a = state[0], b = state[1], c = state[2], d = state[3], e = state[4]; \
+	if (t > 79) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(b, c, d, e, a, me2, 79); \
+	if (t > 78) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(c, d, e, a, b, me2, 78); \
+	if (t > 77) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(d, e, a, b, c, me2, 77); \
+	if (t > 76) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(e, a, b, c, d, me2, 76); \
+	if (t > 75) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, me2, 75); \
+	if (t > 74) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(b, c, d, e, a, me2, 74); \
+	if (t > 73) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(c, d, e, a, b, me2, 73); \
+	if (t > 72) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(d, e, a, b, c, me2, 72); \
+	if (t > 71) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(e, a, b, c, d, me2, 71); \
+	if (t > 70) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, me2, 70); \
+	if (t > 69) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(b, c, d, e, a, me2, 69); \
+	if (t > 68) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(c, d, e, a, b, me2, 68); \
+	if (t > 67) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(d, e, a, b, c, me2, 67); \
+	if (t > 66) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(e, a, b, c, d, me2, 66); \
+	if (t > 65) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, me2, 65); \
+	if (t > 64) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(b, c, d, e, a, me2, 64); \
+	if (t > 63) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(c, d, e, a, b, me2, 63); \
+	if (t > 62) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(d, e, a, b, c, me2, 62); \
+	if (t > 61) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(e, a, b, c, d, me2, 61); \
+	if (t > 60) HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, me2, 60); \
+	if (t > 59) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(b, c, d, e, a, me2, 59); \
+	if (t > 58) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(c, d, e, a, b, me2, 58); \
+	if (t > 57) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(d, e, a, b, c, me2, 57); \
+	if (t > 56) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(e, a, b, c, d, me2, 56); \
+	if (t > 55) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, me2, 55); \
+	if (t > 54) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(b, c, d, e, a, me2, 54); \
+	if (t > 53) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(c, d, e, a, b, me2, 53); \
+	if (t > 52) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(d, e, a, b, c, me2, 52); \
+	if (t > 51) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(e, a, b, c, d, me2, 51); \
+	if (t > 50) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, me2, 50); \
+	if (t > 49) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(b, c, d, e, a, me2, 49); \
+	if (t > 48) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(c, d, e, a, b, me2, 48); \
+	if (t > 47) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(d, e, a, b, c, me2, 47); \
+	if (t > 46) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(e, a, b, c, d, me2, 46); \
+	if (t > 45) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, me2, 45); \
+	if (t > 44) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(b, c, d, e, a, me2, 44); \
+	if (t > 43) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(c, d, e, a, b, me2, 43); \
+	if (t > 42) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(d, e, a, b, c, me2, 42); \
+	if (t > 41) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(e, a, b, c, d, me2, 41); \
+	if (t > 40) HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, me2, 40); \
+	if (t > 39) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(b, c, d, e, a, me2, 39); \
+	if (t > 38) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(c, d, e, a, b, me2, 38); \
+	if (t > 37) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(d, e, a, b, c, me2, 37); \
+	if (t > 36) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(e, a, b, c, d, me2, 36); \
+	if (t > 35) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, me2, 35); \
+	if (t > 34) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(b, c, d, e, a, me2, 34); \
+	if (t > 33) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(c, d, e, a, b, me2, 33); \
+	if (t > 32) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(d, e, a, b, c, me2, 32); \
+	if (t > 31) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(e, a, b, c, d, me2, 31); \
+	if (t > 30) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, me2, 30); \
+	if (t > 29) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(b, c, d, e, a, me2, 29); \
+	if (t > 28) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(c, d, e, a, b, me2, 28); \
+	if (t > 27) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(d, e, a, b, c, me2, 27); \
+	if (t > 26) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(e, a, b, c, d, me2, 26); \
+	if (t > 25) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, me2, 25); \
+	if (t > 24) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(b, c, d, e, a, me2, 24); \
+	if (t > 23) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(c, d, e, a, b, me2, 23); \
+	if (t > 22) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(d, e, a, b, c, me2, 22); \
+	if (t > 21) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(e, a, b, c, d, me2, 21); \
+	if (t > 20) HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, me2, 20); \
+	if (t > 19) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(b, c, d, e, a, me2, 19); \
+	if (t > 18) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(c, d, e, a, b, me2, 18); \
+	if (t > 17) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(d, e, a, b, c, me2, 17); \
+	if (t > 16) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(e, a, b, c, d, me2, 16); \
+	if (t > 15) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, me2, 15); \
+	if (t > 14) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(b, c, d, e, a, me2, 14); \
+	if (t > 13) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(c, d, e, a, b, me2, 13); \
+	if (t > 12) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(d, e, a, b, c, me2, 12); \
+	if (t > 11) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(e, a, b, c, d, me2, 11); \
+	if (t > 10) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, me2, 10); \
+	if (t > 9) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(b, c, d, e, a, me2, 9); \
+	if (t > 8) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(c, d, e, a, b, me2, 8); \
+	if (t > 7) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(d, e, a, b, c, me2, 7); \
+	if (t > 6) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(e, a, b, c, d, me2, 6); \
+	if (t > 5) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, me2, 5); \
+	if (t > 4) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(b, c, d, e, a, me2, 4); \
+	if (t > 3) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(c, d, e, a, b, me2, 3); \
+	if (t > 2) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(d, e, a, b, c, me2, 2); \
+	if (t > 1) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(e, a, b, c, d, me2, 1); \
+	if (t > 0) HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, me2, 0); \
+	ihvin[0] = a; ihvin[1] = b; ihvin[2] = c; ihvin[3] = d; ihvin[4] = e; \
+	a = state[0]; b = state[1]; c = state[2]; d = state[3]; e = state[4]; \
+	if (t <= 0) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, me2, 0); \
+	if (t <= 1) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, me2, 1); \
+	if (t <= 2) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, me2, 2); \
+	if (t <= 3) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, me2, 3); \
+	if (t <= 4) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, me2, 4); \
+	if (t <= 5) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, me2, 5); \
+	if (t <= 6) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, me2, 6); \
+	if (t <= 7) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, me2, 7); \
+	if (t <= 8) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, me2, 8); \
+	if (t <= 9) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, me2, 9); \
+	if (t <= 10) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, me2, 10); \
+	if (t <= 11) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, me2, 11); \
+	if (t <= 12) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, me2, 12); \
+	if (t <= 13) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, me2, 13); \
+	if (t <= 14) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, me2, 14); \
+	if (t <= 15) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, me2, 15); \
+	if (t <= 16) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, me2, 16); \
+	if (t <= 17) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, me2, 17); \
+	if (t <= 18) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, me2, 18); \
+	if (t <= 19) HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, me2, 19); \
+	if (t <= 20) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, me2, 20); \
+	if (t <= 21) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, me2, 21); \
+	if (t <= 22) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, me2, 22); \
+	if (t <= 23) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, me2, 23); \
+	if (t <= 24) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, me2, 24); \
+	if (t <= 25) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, me2, 25); \
+	if (t <= 26) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, me2, 26); \
+	if (t <= 27) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, me2, 27); \
+	if (t <= 28) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, me2, 28); \
+	if (t <= 29) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, me2, 29); \
+	if (t <= 30) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, me2, 30); \
+	if (t <= 31) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, me2, 31); \
+	if (t <= 32) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, me2, 32); \
+	if (t <= 33) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, me2, 33); \
+	if (t <= 34) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, me2, 34); \
+	if (t <= 35) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, me2, 35); \
+	if (t <= 36) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, me2, 36); \
+	if (t <= 37) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, me2, 37); \
+	if (t <= 38) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, me2, 38); \
+	if (t <= 39) HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, me2, 39); \
+	if (t <= 40) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, me2, 40); \
+	if (t <= 41) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, me2, 41); \
+	if (t <= 42) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, me2, 42); \
+	if (t <= 43) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, me2, 43); \
+	if (t <= 44) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, me2, 44); \
+	if (t <= 45) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, me2, 45); \
+	if (t <= 46) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, me2, 46); \
+	if (t <= 47) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, me2, 47); \
+	if (t <= 48) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, me2, 48); \
+	if (t <= 49) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, me2, 49); \
+	if (t <= 50) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, me2, 50); \
+	if (t <= 51) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, me2, 51); \
+	if (t <= 52) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, me2, 52); \
+	if (t <= 53) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, me2, 53); \
+	if (t <= 54) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, me2, 54); \
+	if (t <= 55) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, me2, 55); \
+	if (t <= 56) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(e, a, b, c, d, me2, 56); \
+	if (t <= 57) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(d, e, a, b, c, me2, 57); \
+	if (t <= 58) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(c, d, e, a, b, me2, 58); \
+	if (t <= 59) HASHCLASH_SHA1COMPRESS_ROUND3_STEP(b, c, d, e, a, me2, 59); \
+	if (t <= 60) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, me2, 60); \
+	if (t <= 61) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, me2, 61); \
+	if (t <= 62) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, me2, 62); \
+	if (t <= 63) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, me2, 63); \
+	if (t <= 64) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, me2, 64); \
+	if (t <= 65) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, me2, 65); \
+	if (t <= 66) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, me2, 66); \
+	if (t <= 67) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, me2, 67); \
+	if (t <= 68) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, me2, 68); \
+	if (t <= 69) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, me2, 69); \
+	if (t <= 70) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, me2, 70); \
+	if (t <= 71) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, me2, 71); \
+	if (t <= 72) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, me2, 72); \
+	if (t <= 73) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, me2, 73); \
+	if (t <= 74) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, me2, 74); \
+	if (t <= 75) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, me2, 75); \
+	if (t <= 76) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(e, a, b, c, d, me2, 76); \
+	if (t <= 77) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(d, e, a, b, c, me2, 77); \
+	if (t <= 78) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(c, d, e, a, b, me2, 78); \
+	if (t <= 79) HASHCLASH_SHA1COMPRESS_ROUND4_STEP(b, c, d, e, a, me2, 79); \
+	ihvout[0] = ihvin[0] + a; ihvout[1] = ihvin[1] + b; ihvout[2] = ihvin[2] + c; ihvout[3] = ihvin[3] + d; ihvout[4] = ihvin[4] + e; \
+} 
+
+SHA1_RECOMPRESS(0)
+SHA1_RECOMPRESS(1)
+SHA1_RECOMPRESS(2)
+SHA1_RECOMPRESS(3)
+SHA1_RECOMPRESS(4)
+SHA1_RECOMPRESS(5)
+SHA1_RECOMPRESS(6)
+SHA1_RECOMPRESS(7)
+SHA1_RECOMPRESS(8)
+SHA1_RECOMPRESS(9)
+
+SHA1_RECOMPRESS(10)
+SHA1_RECOMPRESS(11)
+SHA1_RECOMPRESS(12)
+SHA1_RECOMPRESS(13)
+SHA1_RECOMPRESS(14)
+SHA1_RECOMPRESS(15)
+SHA1_RECOMPRESS(16)
+SHA1_RECOMPRESS(17)
+SHA1_RECOMPRESS(18)
+SHA1_RECOMPRESS(19)
+
+SHA1_RECOMPRESS(20)
+SHA1_RECOMPRESS(21)
+SHA1_RECOMPRESS(22)
+SHA1_RECOMPRESS(23)
+SHA1_RECOMPRESS(24)
+SHA1_RECOMPRESS(25)
+SHA1_RECOMPRESS(26)
+SHA1_RECOMPRESS(27)
+SHA1_RECOMPRESS(28)
+SHA1_RECOMPRESS(29)
+
+SHA1_RECOMPRESS(30)
+SHA1_RECOMPRESS(31)
+SHA1_RECOMPRESS(32)
+SHA1_RECOMPRESS(33)
+SHA1_RECOMPRESS(34)
+SHA1_RECOMPRESS(35)
+SHA1_RECOMPRESS(36)
+SHA1_RECOMPRESS(37)
+SHA1_RECOMPRESS(38)
+SHA1_RECOMPRESS(39)
+
+SHA1_RECOMPRESS(40)
+SHA1_RECOMPRESS(41)
+SHA1_RECOMPRESS(42)
+SHA1_RECOMPRESS(43)
+SHA1_RECOMPRESS(44)
+SHA1_RECOMPRESS(45)
+SHA1_RECOMPRESS(46)
+SHA1_RECOMPRESS(47)
+SHA1_RECOMPRESS(48)
+SHA1_RECOMPRESS(49)
+
+SHA1_RECOMPRESS(50)
+SHA1_RECOMPRESS(51)
+SHA1_RECOMPRESS(52)
+SHA1_RECOMPRESS(53)
+SHA1_RECOMPRESS(54)
+SHA1_RECOMPRESS(55)
+SHA1_RECOMPRESS(56)
+SHA1_RECOMPRESS(57)
+SHA1_RECOMPRESS(58)
+SHA1_RECOMPRESS(59)
+
+SHA1_RECOMPRESS(60)
+SHA1_RECOMPRESS(61)
+SHA1_RECOMPRESS(62)
+SHA1_RECOMPRESS(63)
+SHA1_RECOMPRESS(64)
+SHA1_RECOMPRESS(65)
+SHA1_RECOMPRESS(66)
+SHA1_RECOMPRESS(67)
+SHA1_RECOMPRESS(68)
+SHA1_RECOMPRESS(69)
+
+SHA1_RECOMPRESS(70)
+SHA1_RECOMPRESS(71)
+SHA1_RECOMPRESS(72)
+SHA1_RECOMPRESS(73)
+SHA1_RECOMPRESS(74)
+SHA1_RECOMPRESS(75)
+SHA1_RECOMPRESS(76)
+SHA1_RECOMPRESS(77)
+SHA1_RECOMPRESS(78)
+SHA1_RECOMPRESS(79)
+
+sha1_recompression_type sha1_recompression_step[80] =
+{
+	sha1recompress_fast_0, sha1recompress_fast_1, sha1recompress_fast_2, sha1recompress_fast_3, sha1recompress_fast_4, sha1recompress_fast_5, sha1recompress_fast_6, sha1recompress_fast_7, sha1recompress_fast_8, sha1recompress_fast_9,
+	sha1recompress_fast_10, sha1recompress_fast_11, sha1recompress_fast_12, sha1recompress_fast_13, sha1recompress_fast_14, sha1recompress_fast_15, sha1recompress_fast_16, sha1recompress_fast_17, sha1recompress_fast_18, sha1recompress_fast_19,
+	sha1recompress_fast_20, sha1recompress_fast_21, sha1recompress_fast_22, sha1recompress_fast_23, sha1recompress_fast_24, sha1recompress_fast_25, sha1recompress_fast_26, sha1recompress_fast_27, sha1recompress_fast_28, sha1recompress_fast_29,
+	sha1recompress_fast_30, sha1recompress_fast_31, sha1recompress_fast_32, sha1recompress_fast_33, sha1recompress_fast_34, sha1recompress_fast_35, sha1recompress_fast_36, sha1recompress_fast_37, sha1recompress_fast_38, sha1recompress_fast_39,
+	sha1recompress_fast_40, sha1recompress_fast_41, sha1recompress_fast_42, sha1recompress_fast_43, sha1recompress_fast_44, sha1recompress_fast_45, sha1recompress_fast_46, sha1recompress_fast_47, sha1recompress_fast_48, sha1recompress_fast_49,
+	sha1recompress_fast_50, sha1recompress_fast_51, sha1recompress_fast_52, sha1recompress_fast_53, sha1recompress_fast_54, sha1recompress_fast_55, sha1recompress_fast_56, sha1recompress_fast_57, sha1recompress_fast_58, sha1recompress_fast_59,
+	sha1recompress_fast_60, sha1recompress_fast_61, sha1recompress_fast_62, sha1recompress_fast_63, sha1recompress_fast_64, sha1recompress_fast_65, sha1recompress_fast_66, sha1recompress_fast_67, sha1recompress_fast_68, sha1recompress_fast_69,
+	sha1recompress_fast_70, sha1recompress_fast_71, sha1recompress_fast_72, sha1recompress_fast_73, sha1recompress_fast_74, sha1recompress_fast_75, sha1recompress_fast_76, sha1recompress_fast_77, sha1recompress_fast_78, sha1recompress_fast_79,
+};
+
+
+
+
+
+void sha1_process(SHA1_CTX* ctx, const uint32_t block[16]) 
+{
+	unsigned i, j;
+	uint32_t ubc_dv_mask[DVMASKSIZE];
+	uint32_t ihvtmp[5];
+	for (i=0; i < DVMASKSIZE; ++i)
+		ubc_dv_mask[i]=0;
+	ctx->ihv1[0] = ctx->ihv[0];
+	ctx->ihv1[1] = ctx->ihv[1];
+	ctx->ihv1[2] = ctx->ihv[2];
+	ctx->ihv1[3] = ctx->ihv[3];
+	ctx->ihv1[4] = ctx->ihv[4];
+	memcpy(ctx->m1, block, 64);
+	sha1_message_expansion(ctx->m1);
+	if (ctx->detect_coll && ctx->ubc_check)
+	{
+		ubc_check(ctx->m1, ubc_dv_mask);
+	}
+	sha1_compression_states(ctx->ihv, ctx->m1, ctx->states);
+	if (ctx->detect_coll)
+	{
+		for (i = 0; sha1_dvs[i].dvType != 0; ++i) 
+		{
+			if ((0 == ctx->ubc_check) || (((uint32_t)(1) << sha1_dvs[i].maskb) & ubc_dv_mask[sha1_dvs[i].maski]))
+			{
+				for (j = 0; j < 80; ++j)
+					ctx->m2[j] = ctx->m1[j] ^ sha1_dvs[i].dm[j];
+				(sha1_recompression_step[sha1_dvs[i].testt])(ctx->ihv2, ihvtmp, ctx->m2, ctx->states[sha1_dvs[i].testt]);
+				// to verify SHA-1 collision detection code with collisions for reduced-step SHA-1
+				if ((ihvtmp[0] == ctx->ihv[0] && ihvtmp[1] == ctx->ihv[1] && ihvtmp[2] == ctx->ihv[2] && ihvtmp[3] == ctx->ihv[3] && ihvtmp[4] == ctx->ihv[4])
+					|| (ctx->reduced_round_coll && ctx->ihv1[0] == ctx->ihv2[0] && ctx->ihv1[1] == ctx->ihv2[1] && ctx->ihv1[2] == ctx->ihv2[2] && ctx->ihv1[3] == ctx->ihv2[3] && ctx->ihv1[4] == ctx->ihv2[4]))
+				{
+					ctx->found_collision = 1;
+					// TODO: call callback
+					if (ctx->callback != NULL)
+						ctx->callback(ctx->total - 64, ctx->ihv1, ctx->ihv2, ctx->m1, ctx->m2);
+
+					if (ctx->safe_hash) 
+					{
+						sha1_compression_W(ctx->ihv, ctx->m1);
+						sha1_compression_W(ctx->ihv, ctx->m1);
+					}
+
+					break;
+				}
+			}
+		}
+	}
+}
+
+
+
+
+
+void swap_bytes(uint32_t val[16]) 
+{
+	unsigned i;
+	for (i = 0; i < 16; ++i) 
+	{
+		val[i] = ((val[i] << 8) & 0xFF00FF00) | ((val[i] >> 8) & 0xFF00FF);
+		val[i] = (val[i] << 16) | (val[i] >> 16);
+	}
+}
+
+void SHA1DCInit(SHA1_CTX* ctx) 
+{
+	static const union { unsigned char bytes[4]; uint32_t value; } endianness = { { 0, 1, 2, 3 } };
+	static const uint32_t littleendian = 0x03020100;
+	ctx->total = 0;
+	ctx->ihv[0] = 0x67452301;
+	ctx->ihv[1] = 0xEFCDAB89;
+	ctx->ihv[2] = 0x98BADCFE;
+	ctx->ihv[3] = 0x10325476;
+	ctx->ihv[4] = 0xC3D2E1F0;
+	ctx->found_collision = 0;
+	ctx->safe_hash = 1;
+	ctx->ubc_check = 1;
+	ctx->detect_coll = 1;
+	ctx->reduced_round_coll = 0;
+	ctx->bigendian = (endianness.value != littleendian);
+	ctx->callback = NULL;
+}
+
+void SHA1DCSetSafeHash(SHA1_CTX* ctx, int safehash)
+{
+	if (safehash)
+		ctx->safe_hash = 1;
+	else
+		ctx->safe_hash = 0;
+}
+
+
+void SHA1DCSetUseUBC(SHA1_CTX* ctx, int ubc_check)
+{
+	if (ubc_check)
+		ctx->ubc_check = 1;
+	else
+		ctx->ubc_check = 0;
+}
+
+void SHA1DCSetUseDetectColl(SHA1_CTX* ctx, int detect_coll)
+{
+	if (detect_coll)
+		ctx->detect_coll = 1;
+	else
+		ctx->detect_coll = 0;
+}
+
+void SHA1DCSetDetectReducedRoundCollision(SHA1_CTX* ctx, int reduced_round_coll)
+{
+	if (reduced_round_coll)
+		ctx->reduced_round_coll = 1;
+	else
+		ctx->reduced_round_coll = 0;
+}
+
+void SHA1DCSetCallback(SHA1_CTX* ctx, collision_block_callback callback)
+{
+	ctx->callback = callback;
+}
+
+void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, unsigned len) 
+{
+	unsigned left, fill;
+	if (len == 0) 
+		return;
+
+	left = ctx->total & 63;
+	fill = 64 - left;
+
+	if (left && len >= fill) 
+	{
+		ctx->total += fill;
+		memcpy(ctx->buffer + left, buf, fill);
+		if (!ctx->bigendian)
+			swap_bytes((uint32_t*)(ctx->buffer));
+		sha1_process(ctx, (uint32_t*)(ctx->buffer));
+		buf += fill;
+		len -= fill;
+		left = 0;
+	}
+	while (len >= 64) 
+	{
+		ctx->total += 64;
+		if (!ctx->bigendian) 
+		{
+			memcpy(ctx->buffer, buf, 64);
+			swap_bytes((uint32_t*)(ctx->buffer));
+			sha1_process(ctx, (uint32_t*)(ctx->buffer));
+		}
+		else
+			sha1_process(ctx, (uint32_t*)(buf));
+		buf += 64;
+		len -= 64;
+	}
+	if (len > 0) 
+	{
+		ctx->total += len;
+		memcpy(ctx->buffer + left, buf, len);
+	}
+}
+
+static const unsigned char sha1_padding[64] =
+{
+	0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+int SHA1DCFinal(unsigned char output[20], SHA1_CTX *ctx)
+{
+	uint32_t last = ctx->total & 63;
+	uint32_t padn = (last < 56) ? (56 - last) : (120 - last);
+	uint64_t total;
+	SHA1DCUpdate(ctx, (const char*)(sha1_padding), padn);
+	
+	total = ctx->total - padn;
+	total <<= 3;
+	ctx->buffer[56] = (unsigned char)(total >> 56);
+	ctx->buffer[57] = (unsigned char)(total >> 48);
+	ctx->buffer[58] = (unsigned char)(total >> 40);
+	ctx->buffer[59] = (unsigned char)(total >> 32);
+	ctx->buffer[60] = (unsigned char)(total >> 24);
+	ctx->buffer[61] = (unsigned char)(total >> 16);
+	ctx->buffer[62] = (unsigned char)(total >> 8);
+	ctx->buffer[63] = (unsigned char)(total);
+	if (!ctx->bigendian)
+		swap_bytes((uint32_t*)(ctx->buffer));
+	sha1_process(ctx, (uint32_t*)(ctx->buffer));
+	output[0] = (unsigned char)(ctx->ihv[0] >> 24);
+	output[1] = (unsigned char)(ctx->ihv[0] >> 16);
+	output[2] = (unsigned char)(ctx->ihv[0] >> 8);
+	output[3] = (unsigned char)(ctx->ihv[0]);
+	output[4] = (unsigned char)(ctx->ihv[1] >> 24);
+	output[5] = (unsigned char)(ctx->ihv[1] >> 16);
+	output[6] = (unsigned char)(ctx->ihv[1] >> 8);
+	output[7] = (unsigned char)(ctx->ihv[1]);
+	output[8] = (unsigned char)(ctx->ihv[2] >> 24);
+	output[9] = (unsigned char)(ctx->ihv[2] >> 16);
+	output[10] = (unsigned char)(ctx->ihv[2] >> 8);
+	output[11] = (unsigned char)(ctx->ihv[2]);
+	output[12] = (unsigned char)(ctx->ihv[3] >> 24);
+	output[13] = (unsigned char)(ctx->ihv[3] >> 16);
+	output[14] = (unsigned char)(ctx->ihv[3] >> 8);
+	output[15] = (unsigned char)(ctx->ihv[3]);
+	output[16] = (unsigned char)(ctx->ihv[4] >> 24);
+	output[17] = (unsigned char)(ctx->ihv[4] >> 16);
+	output[18] = (unsigned char)(ctx->ihv[4] >> 8);
+	output[19] = (unsigned char)(ctx->ihv[4]);
+	return ctx->found_collision;
+}
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
new file mode 100644
index 000000000..8b522f9d2
--- /dev/null
+++ b/sha1dc/sha1.h
@@ -0,0 +1,94 @@
+/***
+* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow <danshu@microsoft.com>
+* Distributed under the MIT Software License.
+* See accompanying file LICENSE.txt or copy at
+* https://opensource.org/licenses/MIT
+***/
+
+#include <stdint.h>
+
+// uses SHA-1 message expansion to expand the first 16 words of W[] to 80 words
+void sha1_message_expansion(uint32_t W[80]);
+
+// sha-1 compression function; first version takes a message block pre-parsed as 16 32-bit integers, second version takes an already expanded message)
+void sha1_compression(uint32_t ihv[5], const uint32_t m[16]);
+void sha1_compression_W(uint32_t ihv[5], const uint32_t W[80]);
+
+// same as sha1_compression_W, but additionally store intermediate states
+// only stores states ii (the state between step ii-1 and step ii) when DOSTORESTATEii is defined in ubc_check.h
+void sha1_compression_states(uint32_t ihv[5], const uint32_t W[80], uint32_t states[80][5]);
+
+// function type for sha1_recompression_step_T (uint32_t ihvin[5], uint32_t ihvout[5], const uint32_t me2[80], const uint32_t state[5])
+// where 0 <= T < 80
+//       me2 is an expanded message (the expansion of an original message block XOR'ed with a disturbance vector's message block difference)
+//       state is the internal state (a,b,c,d,e) before step T of the SHA-1 compression function while processing the original message block
+// the function will return:
+//       ihvin: the reconstructed input chaining value
+//       ihvout: the reconstructed output chaining value
+typedef void(*sha1_recompression_type)(uint32_t*, uint32_t*, const uint32_t*, const uint32_t*);
+
+// table of sha1_recompression_step_0, ... , sha1_recompression_step_79
+extern sha1_recompression_type sha1_recompression_step[80];
+
+// a callback function type that can be set to be called when a collision block has been found:
+// void collision_block_callback(uint64_t byteoffset, const uint32_t ihvin1[5], const uint32_t ihvin2[5], const uint32_t m1[80], const uint32_t m2[80])
+typedef void(*collision_block_callback)(uint64_t, const uint32_t*, const uint32_t*, const uint32_t*, const uint32_t*);
+
+// the SHA-1 context
+typedef struct {
+	uint64_t total;
+	uint32_t ihv[5];
+	unsigned char buffer[64];
+	int bigendian;
+	int found_collision;
+	int safe_hash;
+	int detect_coll;
+	int ubc_check;
+	int reduced_round_coll;
+	collision_block_callback callback;
+
+	uint32_t ihv1[5];
+	uint32_t ihv2[5];
+	uint32_t m1[80];
+	uint32_t m2[80];
+	uint32_t states[80][5];
+} SHA1_CTX;
+
+// initialize SHA-1 context
+void SHA1DCInit(SHA1_CTX*); 
+
+// function to enable safe SHA-1 hashing:
+// collision attacks are thwarted by hashing a detected near-collision block 3 times
+// think of it as extending SHA-1 from 80-steps to 240-steps for such blocks:
+//   the best collision attacks against SHA-1 have complexity about 2^60, 
+//   thus for 240-steps an immediate lower-bound for the best cryptanalytic attacks would 2^180
+//   an attacker would be better off using a generic birthday search of complexity 2^80
+//
+// enabling safe SHA-1 hashing will result in the correct SHA-1 hash for messages where no collision attack was detected
+// but it will result in a different SHA-1 hash for messages where a collision attack was detected 
+// this will automatically invalidate SHA-1 based digital signature forgeries
+// enabled by default
+void SHA1DCSetSafeHash(SHA1_CTX*, int);
+
+// function to disable or enable the use of Unavoidable Bitconditions (provides a significant speed up)
+// enabled by default
+void SHA1DCSetUseUBC(SHA1_CTX*, int);
+
+// function to disable or enable the use of Collision Detection
+// enabled by default
+void SHA1DCSetUseDetectColl(SHA1_CTX* ctx, int detect_coll);
+
+// function to disable or enable the detection of reduced-round SHA-1 collisions
+// disabled by default
+void SHA1DCSetDetectReducedRoundCollision(SHA1_CTX*, int);
+
+// function to set a callback function, pass NULL to disable
+// by default no callback set
+void SHA1DCSetCallback(SHA1_CTX*, collision_block_callback);
+
+// update SHA-1 context with buffer contents
+void SHA1DCUpdate(SHA1_CTX*, const char*, unsigned);
+
+// obtain SHA-1 hash from SHA-1 context
+// returns: 0 = no collision detected, otherwise = collision found => warn user for active attack
+int  SHA1DCFinal(unsigned char[20], SHA1_CTX*); 
diff --git a/sha1dc/ubc_check.c b/sha1dc/ubc_check.c
new file mode 100644
index 000000000..556aaf3c5
--- /dev/null
+++ b/sha1dc/ubc_check.c
@@ -0,0 +1,361 @@
+/***
+* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow <danshu@microsoft.com>
+* Distributed under the MIT Software License.
+* See accompanying file LICENSE.txt or copy at
+* https://opensource.org/licenses/MIT
+***/
+
+// this file was generated by the 'parse_bitrel' program in the tools section
+// using the data files from directory 'tools/data/3565'
+//
+// sha1_dvs contains a list of SHA-1 Disturbance Vectors (DV) to check
+// dvType, dvK and dvB define the DV: I(K,B) or II(K,B) (see the paper)
+// dm[80] is the expanded message block XOR-difference defined by the DV
+// testt is the step to do the recompression from for collision detection
+// maski and maskb define the bit to check for each DV in the dvmask returned by ubc_check
+//
+// ubc_check takes as input an expanded message block and verifies the unavoidable bitconditions for all listed DVs
+// it returns a dvmask where each bit belonging to a DV is set if all unavoidable bitconditions for that DV have been met
+// thus one needs to do the recompression check for each DV that has its bit set
+// 
+// ubc_check is programmatically generated and the unavoidable bitconditions have been hardcoded
+// a directly verifiable version named ubc_check_verify can be found in ubc_check_verify.c
+// ubc_check has been verified against ubc_check_verify using the 'ubc_check_test' program in the tools section
+
+#include <stdint.h>
+#include "ubc_check.h"
+
+static const uint32_t DV_I_43_0_bit 	= (uint32_t)(1) << 0;
+static const uint32_t DV_I_44_0_bit 	= (uint32_t)(1) << 1;
+static const uint32_t DV_I_45_0_bit 	= (uint32_t)(1) << 2;
+static const uint32_t DV_I_46_0_bit 	= (uint32_t)(1) << 3;
+static const uint32_t DV_I_46_2_bit 	= (uint32_t)(1) << 4;
+static const uint32_t DV_I_47_0_bit 	= (uint32_t)(1) << 5;
+static const uint32_t DV_I_47_2_bit 	= (uint32_t)(1) << 6;
+static const uint32_t DV_I_48_0_bit 	= (uint32_t)(1) << 7;
+static const uint32_t DV_I_48_2_bit 	= (uint32_t)(1) << 8;
+static const uint32_t DV_I_49_0_bit 	= (uint32_t)(1) << 9;
+static const uint32_t DV_I_49_2_bit 	= (uint32_t)(1) << 10;
+static const uint32_t DV_I_50_0_bit 	= (uint32_t)(1) << 11;
+static const uint32_t DV_I_50_2_bit 	= (uint32_t)(1) << 12;
+static const uint32_t DV_I_51_0_bit 	= (uint32_t)(1) << 13;
+static const uint32_t DV_I_51_2_bit 	= (uint32_t)(1) << 14;
+static const uint32_t DV_I_52_0_bit 	= (uint32_t)(1) << 15;
+static const uint32_t DV_II_45_0_bit 	= (uint32_t)(1) << 16;
+static const uint32_t DV_II_46_0_bit 	= (uint32_t)(1) << 17;
+static const uint32_t DV_II_46_2_bit 	= (uint32_t)(1) << 18;
+static const uint32_t DV_II_47_0_bit 	= (uint32_t)(1) << 19;
+static const uint32_t DV_II_48_0_bit 	= (uint32_t)(1) << 20;
+static const uint32_t DV_II_49_0_bit 	= (uint32_t)(1) << 21;
+static const uint32_t DV_II_49_2_bit 	= (uint32_t)(1) << 22;
+static const uint32_t DV_II_50_0_bit 	= (uint32_t)(1) << 23;
+static const uint32_t DV_II_50_2_bit 	= (uint32_t)(1) << 24;
+static const uint32_t DV_II_51_0_bit 	= (uint32_t)(1) << 25;
+static const uint32_t DV_II_51_2_bit 	= (uint32_t)(1) << 26;
+static const uint32_t DV_II_52_0_bit 	= (uint32_t)(1) << 27;
+static const uint32_t DV_II_53_0_bit 	= (uint32_t)(1) << 28;
+static const uint32_t DV_II_54_0_bit 	= (uint32_t)(1) << 29;
+static const uint32_t DV_II_55_0_bit 	= (uint32_t)(1) << 30;
+static const uint32_t DV_II_56_0_bit 	= (uint32_t)(1) << 31;
+
+dv_info_t sha1_dvs[] = 
+{
+  {1,43,0,58,0,0, { 0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408,0x800000e6,0x8000004c,0x00000803,0x80000161,0x80000599 } }
+, {1,44,0,58,0,1, { 0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408,0x800000e6,0x8000004c,0x00000803,0x80000161 } }
+, {1,45,0,58,0,2, { 0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408,0x800000e6,0x8000004c,0x00000803 } }
+, {1,46,0,58,0,3, { 0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408,0x800000e6,0x8000004c } }
+, {1,46,2,58,0,4, { 0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a,0x00000060,0x00000590,0x00001020,0x0000039a,0x00000132 } }
+, {1,47,0,58,0,5, { 0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408,0x800000e6 } }
+, {1,47,2,58,0,6, { 0x20000043,0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a,0x00000060,0x00000590,0x00001020,0x0000039a } }
+, {1,48,0,58,0,7, { 0xb800000a,0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164,0x00000408 } }
+, {1,48,2,58,0,8, { 0xe000002a,0x20000043,0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a,0x00000060,0x00000590,0x00001020 } }
+, {1,49,0,58,0,9, { 0x18000000,0xb800000a,0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018,0x00000164 } }
+, {1,49,2,58,0,10, { 0x60000000,0xe000002a,0x20000043,0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a,0x00000060,0x00000590 } }
+, {1,50,0,65,0,11, { 0x0800000c,0x18000000,0xb800000a,0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202,0x00000018 } }
+, {1,50,2,65,0,12, { 0x20000030,0x60000000,0xe000002a,0x20000043,0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a,0x00000060 } }
+, {1,51,0,65,0,13, { 0xe8000000,0x0800000c,0x18000000,0xb800000a,0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012,0x80000202 } }
+, {1,51,2,65,0,14, { 0xa0000003,0x20000030,0x60000000,0xe000002a,0x20000043,0xb0000040,0xd0000053,0xd0000022,0x20000000,0x60000032,0x60000043,0x20000040,0xe0000042,0x60000002,0x80000001,0x00000020,0x00000003,0x40000052,0x40000040,0xe0000052,0xa0000000,0x80000040,0x20000001,0x20000060,0x80000001,0x40000042,0xc0000043,0x40000022,0x00000003,0x40000042,0xc0000043,0xc0000022,0x00000001,0x40000002,0xc0000043,0x40000062,0x80000001,0x40000042,0x40000042,0x40000002,0x00000002,0x00000040,0x80000002,0x80000000,0x80000002,0x80000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000000,0x00000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000101,0x00000009,0x00000012,0x00000202,0x0000001a,0x00000124,0x0000040c,0x00000026,0x0000004a,0x0000080a } }
+, {1,52,0,65,0,15, { 0x04000010,0xe8000000,0x0800000c,0x18000000,0xb800000a,0xc8000010,0x2c000010,0xf4000014,0xb4000008,0x08000000,0x9800000c,0xd8000010,0x08000010,0xb8000010,0x98000000,0x60000000,0x00000008,0xc0000000,0x90000014,0x10000010,0xb8000014,0x28000000,0x20000010,0x48000000,0x08000018,0x60000000,0x90000010,0xf0000010,0x90000008,0xc0000000,0x90000010,0xf0000010,0xb0000008,0x40000000,0x90000000,0xf0000010,0x90000018,0x60000000,0x90000010,0x90000010,0x90000000,0x80000000,0x00000010,0xa0000000,0x20000000,0xa0000000,0x20000010,0x00000000,0x20000010,0x20000000,0x00000010,0x20000000,0x00000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000040,0x40000002,0x80000004,0x80000080,0x80000006,0x00000049,0x00000103,0x80000009,0x80000012 } }
+, {2,45,0,58,0,16, { 0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b,0x8000016d,0x8000041a,0x000002e4,0x80000054,0x00000967 } }
+, {2,46,0,58,0,17, { 0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b,0x8000016d,0x8000041a,0x000002e4,0x80000054 } }
+, {2,46,2,58,0,18, { 0x90000070,0xb0000053,0x30000008,0x00000043,0xd0000072,0xb0000010,0xf0000062,0xc0000042,0x00000030,0xe0000042,0x20000060,0xe0000041,0x20000050,0xc0000041,0xe0000072,0xa0000003,0xc0000012,0x60000041,0xc0000032,0x20000001,0xc0000002,0xe0000042,0x60000042,0x80000002,0x00000000,0x00000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000001,0x00000060,0x80000003,0x40000002,0xc0000040,0xc0000002,0x80000000,0x80000000,0x80000002,0x00000040,0x00000002,0x80000000,0x80000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000105,0x00000089,0x00000016,0x0000020b,0x0000011b,0x0000012d,0x0000041e,0x00000224,0x00000050,0x0000092e,0x0000046c,0x000005b6,0x0000106a,0x00000b90,0x00000152 } }
+, {2,47,0,58,0,19, { 0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b,0x8000016d,0x8000041a,0x000002e4 } }
+, {2,48,0,58,0,20, { 0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b,0x8000016d,0x8000041a } }
+, {2,49,0,58,0,21, { 0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b,0x8000016d } }
+, {2,49,2,58,0,22, { 0xf0000010,0xf000006a,0x80000040,0x90000070,0xb0000053,0x30000008,0x00000043,0xd0000072,0xb0000010,0xf0000062,0xc0000042,0x00000030,0xe0000042,0x20000060,0xe0000041,0x20000050,0xc0000041,0xe0000072,0xa0000003,0xc0000012,0x60000041,0xc0000032,0x20000001,0xc0000002,0xe0000042,0x60000042,0x80000002,0x00000000,0x00000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000001,0x00000060,0x80000003,0x40000002,0xc0000040,0xc0000002,0x80000000,0x80000000,0x80000002,0x00000040,0x00000002,0x80000000,0x80000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000105,0x00000089,0x00000016,0x0000020b,0x0000011b,0x0000012d,0x0000041e,0x00000224,0x00000050,0x0000092e,0x0000046c,0x000005b6 } }
+, {2,50,0,65,0,23, { 0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b,0x0000011b } }
+, {2,50,2,65,0,24, { 0xd0000072,0xf0000010,0xf000006a,0x80000040,0x90000070,0xb0000053,0x30000008,0x00000043,0xd0000072,0xb0000010,0xf0000062,0xc0000042,0x00000030,0xe0000042,0x20000060,0xe0000041,0x20000050,0xc0000041,0xe0000072,0xa0000003,0xc0000012,0x60000041,0xc0000032,0x20000001,0xc0000002,0xe0000042,0x60000042,0x80000002,0x00000000,0x00000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000001,0x00000060,0x80000003,0x40000002,0xc0000040,0xc0000002,0x80000000,0x80000000,0x80000002,0x00000040,0x00000002,0x80000000,0x80000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000105,0x00000089,0x00000016,0x0000020b,0x0000011b,0x0000012d,0x0000041e,0x00000224,0x00000050,0x0000092e,0x0000046c } }
+, {2,51,0,65,0,25, { 0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014,0x8000024b } }
+, {2,51,2,65,0,26, { 0x00000043,0xd0000072,0xf0000010,0xf000006a,0x80000040,0x90000070,0xb0000053,0x30000008,0x00000043,0xd0000072,0xb0000010,0xf0000062,0xc0000042,0x00000030,0xe0000042,0x20000060,0xe0000041,0x20000050,0xc0000041,0xe0000072,0xa0000003,0xc0000012,0x60000041,0xc0000032,0x20000001,0xc0000002,0xe0000042,0x60000042,0x80000002,0x00000000,0x00000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000000,0x00000040,0x80000001,0x00000060,0x80000003,0x40000002,0xc0000040,0xc0000002,0x80000000,0x80000000,0x80000002,0x00000040,0x00000002,0x80000000,0x80000000,0x80000000,0x00000002,0x00000040,0x00000000,0x80000040,0x80000002,0x00000000,0x80000000,0x80000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000004,0x00000080,0x00000004,0x00000009,0x00000105,0x00000089,0x00000016,0x0000020b,0x0000011b,0x0000012d,0x0000041e,0x00000224,0x00000050,0x0000092e } }
+, {2,52,0,65,0,27, { 0x0c000002,0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089,0x00000014 } }
+, {2,53,0,65,0,28, { 0xcc000014,0x0c000002,0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107,0x00000089 } }
+, {2,54,0,65,0,29, { 0x0400001c,0xcc000014,0x0c000002,0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b,0x80000107 } }
+, {2,55,0,65,0,30, { 0x00000010,0x0400001c,0xcc000014,0x0c000002,0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046,0x4000004b } }
+, {2,56,0,65,0,31, { 0x2600001a,0x00000010,0x0400001c,0xcc000014,0x0c000002,0xc0000010,0xb400001c,0x3c000004,0xbc00001a,0x20000010,0x2400001c,0xec000014,0x0c000002,0xc0000010,0xb400001c,0x2c000004,0xbc000018,0xb0000010,0x0000000c,0xb8000010,0x08000018,0x78000010,0x08000014,0x70000010,0xb800001c,0xe8000000,0xb0000004,0x58000010,0xb000000c,0x48000000,0xb0000000,0xb8000010,0x98000010,0xa0000000,0x00000000,0x00000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0x20000000,0x00000010,0x60000000,0x00000018,0xe0000000,0x90000000,0x30000010,0xb0000000,0x20000000,0x20000000,0xa0000000,0x00000010,0x80000000,0x20000000,0x20000000,0x20000000,0x80000000,0x00000010,0x00000000,0x20000010,0xa0000000,0x00000000,0x20000000,0x20000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000001,0x00000020,0x00000001,0x40000002,0x40000041,0x40000022,0x80000005,0xc0000082,0xc0000046 } }
+, {0,0,0,0,0,0, {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
+};
+void ubc_check(const uint32_t W[80], uint32_t dvmask[1])
+{
+	uint32_t mask = ~((uint32_t)(0));
+	mask &= (((((W[44]^W[45])>>29)&1)-1) | ~(DV_I_48_0_bit|DV_I_51_0_bit|DV_I_52_0_bit|DV_II_45_0_bit|DV_II_46_0_bit|DV_II_50_0_bit|DV_II_51_0_bit));
+	mask &= (((((W[49]^W[50])>>29)&1)-1) | ~(DV_I_46_0_bit|DV_II_45_0_bit|DV_II_50_0_bit|DV_II_51_0_bit|DV_II_55_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[48]^W[49])>>29)&1)-1) | ~(DV_I_45_0_bit|DV_I_52_0_bit|DV_II_49_0_bit|DV_II_50_0_bit|DV_II_54_0_bit|DV_II_55_0_bit));
+	mask &= ((((W[47]^(W[50]>>25))&(1<<4))-(1<<4)) | ~(DV_I_47_0_bit|DV_I_49_0_bit|DV_I_51_0_bit|DV_II_45_0_bit|DV_II_51_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[47]^W[48])>>29)&1)-1) | ~(DV_I_44_0_bit|DV_I_51_0_bit|DV_II_48_0_bit|DV_II_49_0_bit|DV_II_53_0_bit|DV_II_54_0_bit));
+	mask &= (((((W[46]>>4)^(W[49]>>29))&1)-1) | ~(DV_I_46_0_bit|DV_I_48_0_bit|DV_I_50_0_bit|DV_I_52_0_bit|DV_II_50_0_bit|DV_II_55_0_bit));
+	mask &= (((((W[46]^W[47])>>29)&1)-1) | ~(DV_I_43_0_bit|DV_I_50_0_bit|DV_II_47_0_bit|DV_II_48_0_bit|DV_II_52_0_bit|DV_II_53_0_bit));
+	mask &= (((((W[45]>>4)^(W[48]>>29))&1)-1) | ~(DV_I_45_0_bit|DV_I_47_0_bit|DV_I_49_0_bit|DV_I_51_0_bit|DV_II_49_0_bit|DV_II_54_0_bit));
+	mask &= (((((W[45]^W[46])>>29)&1)-1) | ~(DV_I_49_0_bit|DV_I_52_0_bit|DV_II_46_0_bit|DV_II_47_0_bit|DV_II_51_0_bit|DV_II_52_0_bit));
+	mask &= (((((W[44]>>4)^(W[47]>>29))&1)-1) | ~(DV_I_44_0_bit|DV_I_46_0_bit|DV_I_48_0_bit|DV_I_50_0_bit|DV_II_48_0_bit|DV_II_53_0_bit));
+	mask &= (((((W[43]>>4)^(W[46]>>29))&1)-1) | ~(DV_I_43_0_bit|DV_I_45_0_bit|DV_I_47_0_bit|DV_I_49_0_bit|DV_II_47_0_bit|DV_II_52_0_bit));
+	mask &= (((((W[43]^W[44])>>29)&1)-1) | ~(DV_I_47_0_bit|DV_I_50_0_bit|DV_I_51_0_bit|DV_II_45_0_bit|DV_II_49_0_bit|DV_II_50_0_bit));
+	mask &= (((((W[42]>>4)^(W[45]>>29))&1)-1) | ~(DV_I_44_0_bit|DV_I_46_0_bit|DV_I_48_0_bit|DV_I_52_0_bit|DV_II_46_0_bit|DV_II_51_0_bit));
+	mask &= (((((W[41]>>4)^(W[44]>>29))&1)-1) | ~(DV_I_43_0_bit|DV_I_45_0_bit|DV_I_47_0_bit|DV_I_51_0_bit|DV_II_45_0_bit|DV_II_50_0_bit));
+	mask &= (((((W[40]^W[41])>>29)&1)-1) | ~(DV_I_44_0_bit|DV_I_47_0_bit|DV_I_48_0_bit|DV_II_46_0_bit|DV_II_47_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[54]^W[55])>>29)&1)-1) | ~(DV_I_51_0_bit|DV_II_47_0_bit|DV_II_50_0_bit|DV_II_55_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[53]^W[54])>>29)&1)-1) | ~(DV_I_50_0_bit|DV_II_46_0_bit|DV_II_49_0_bit|DV_II_54_0_bit|DV_II_55_0_bit));
+	mask &= (((((W[52]^W[53])>>29)&1)-1) | ~(DV_I_49_0_bit|DV_II_45_0_bit|DV_II_48_0_bit|DV_II_53_0_bit|DV_II_54_0_bit));
+	mask &= ((((W[50]^(W[53]>>25))&(1<<4))-(1<<4)) | ~(DV_I_50_0_bit|DV_I_52_0_bit|DV_II_46_0_bit|DV_II_48_0_bit|DV_II_54_0_bit));
+	mask &= (((((W[50]^W[51])>>29)&1)-1) | ~(DV_I_47_0_bit|DV_II_46_0_bit|DV_II_51_0_bit|DV_II_52_0_bit|DV_II_56_0_bit));
+	mask &= ((((W[49]^(W[52]>>25))&(1<<4))-(1<<4)) | ~(DV_I_49_0_bit|DV_I_51_0_bit|DV_II_45_0_bit|DV_II_47_0_bit|DV_II_53_0_bit));
+	mask &= ((((W[48]^(W[51]>>25))&(1<<4))-(1<<4)) | ~(DV_I_48_0_bit|DV_I_50_0_bit|DV_I_52_0_bit|DV_II_46_0_bit|DV_II_52_0_bit));
+	mask &= (((((W[42]^W[43])>>29)&1)-1) | ~(DV_I_46_0_bit|DV_I_49_0_bit|DV_I_50_0_bit|DV_II_48_0_bit|DV_II_49_0_bit));
+	mask &= (((((W[41]^W[42])>>29)&1)-1) | ~(DV_I_45_0_bit|DV_I_48_0_bit|DV_I_49_0_bit|DV_II_47_0_bit|DV_II_48_0_bit));
+	mask &= (((((W[40]>>4)^(W[43]>>29))&1)-1) | ~(DV_I_44_0_bit|DV_I_46_0_bit|DV_I_50_0_bit|DV_II_49_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[39]>>4)^(W[42]>>29))&1)-1) | ~(DV_I_43_0_bit|DV_I_45_0_bit|DV_I_49_0_bit|DV_II_48_0_bit|DV_II_55_0_bit));
+	if (mask & (DV_I_44_0_bit|DV_I_48_0_bit|DV_II_47_0_bit|DV_II_54_0_bit|DV_II_56_0_bit))
+		mask &= (((((W[38]>>4)^(W[41]>>29))&1)-1) | ~(DV_I_44_0_bit|DV_I_48_0_bit|DV_II_47_0_bit|DV_II_54_0_bit|DV_II_56_0_bit));
+	mask &= (((((W[37]>>4)^(W[40]>>29))&1)-1) | ~(DV_I_43_0_bit|DV_I_47_0_bit|DV_II_46_0_bit|DV_II_53_0_bit|DV_II_55_0_bit));
+	if (mask & (DV_I_52_0_bit|DV_II_48_0_bit|DV_II_51_0_bit|DV_II_56_0_bit))
+		mask &= (((((W[55]^W[56])>>29)&1)-1) | ~(DV_I_52_0_bit|DV_II_48_0_bit|DV_II_51_0_bit|DV_II_56_0_bit));
+	if (mask & (DV_I_52_0_bit|DV_II_48_0_bit|DV_II_50_0_bit|DV_II_56_0_bit))
+		mask &= ((((W[52]^(W[55]>>25))&(1<<4))-(1<<4)) | ~(DV_I_52_0_bit|DV_II_48_0_bit|DV_II_50_0_bit|DV_II_56_0_bit));
+	if (mask & (DV_I_51_0_bit|DV_II_47_0_bit|DV_II_49_0_bit|DV_II_55_0_bit))
+		mask &= ((((W[51]^(W[54]>>25))&(1<<4))-(1<<4)) | ~(DV_I_51_0_bit|DV_II_47_0_bit|DV_II_49_0_bit|DV_II_55_0_bit));
+	if (mask & (DV_I_48_0_bit|DV_II_47_0_bit|DV_II_52_0_bit|DV_II_53_0_bit))
+		mask &= (((((W[51]^W[52])>>29)&1)-1) | ~(DV_I_48_0_bit|DV_II_47_0_bit|DV_II_52_0_bit|DV_II_53_0_bit));
+	if (mask & (DV_I_46_0_bit|DV_I_49_0_bit|DV_II_45_0_bit|DV_II_48_0_bit))
+		mask &= (((((W[36]>>4)^(W[40]>>29))&1)-1) | ~(DV_I_46_0_bit|DV_I_49_0_bit|DV_II_45_0_bit|DV_II_48_0_bit));
+	if (mask & (DV_I_52_0_bit|DV_II_48_0_bit|DV_II_49_0_bit))
+		mask &= ((0-(((W[53]^W[56])>>29)&1)) | ~(DV_I_52_0_bit|DV_II_48_0_bit|DV_II_49_0_bit));
+	if (mask & (DV_I_50_0_bit|DV_II_46_0_bit|DV_II_47_0_bit))
+		mask &= ((0-(((W[51]^W[54])>>29)&1)) | ~(DV_I_50_0_bit|DV_II_46_0_bit|DV_II_47_0_bit));
+	if (mask & (DV_I_49_0_bit|DV_I_51_0_bit|DV_II_45_0_bit))
+		mask &= ((0-(((W[50]^W[52])>>29)&1)) | ~(DV_I_49_0_bit|DV_I_51_0_bit|DV_II_45_0_bit));
+	if (mask & (DV_I_48_0_bit|DV_I_50_0_bit|DV_I_52_0_bit))
+		mask &= ((0-(((W[49]^W[51])>>29)&1)) | ~(DV_I_48_0_bit|DV_I_50_0_bit|DV_I_52_0_bit));
+	if (mask & (DV_I_47_0_bit|DV_I_49_0_bit|DV_I_51_0_bit))
+		mask &= ((0-(((W[48]^W[50])>>29)&1)) | ~(DV_I_47_0_bit|DV_I_49_0_bit|DV_I_51_0_bit));
+	if (mask & (DV_I_46_0_bit|DV_I_48_0_bit|DV_I_50_0_bit))
+		mask &= ((0-(((W[47]^W[49])>>29)&1)) | ~(DV_I_46_0_bit|DV_I_48_0_bit|DV_I_50_0_bit));
+	if (mask & (DV_I_45_0_bit|DV_I_47_0_bit|DV_I_49_0_bit))
+		mask &= ((0-(((W[46]^W[48])>>29)&1)) | ~(DV_I_45_0_bit|DV_I_47_0_bit|DV_I_49_0_bit));
+	mask &= ((((W[45]^W[47])&(1<<6))-(1<<6)) | ~(DV_I_47_2_bit|DV_I_49_2_bit|DV_I_51_2_bit));
+	if (mask & (DV_I_44_0_bit|DV_I_46_0_bit|DV_I_48_0_bit))
+		mask &= ((0-(((W[45]^W[47])>>29)&1)) | ~(DV_I_44_0_bit|DV_I_46_0_bit|DV_I_48_0_bit));
+	mask &= (((((W[44]^W[46])>>6)&1)-1) | ~(DV_I_46_2_bit|DV_I_48_2_bit|DV_I_50_2_bit));
+	if (mask & (DV_I_43_0_bit|DV_I_45_0_bit|DV_I_47_0_bit))
+		mask &= ((0-(((W[44]^W[46])>>29)&1)) | ~(DV_I_43_0_bit|DV_I_45_0_bit|DV_I_47_0_bit));
+	mask &= ((0-((W[41]^(W[42]>>5))&(1<<1))) | ~(DV_I_48_2_bit|DV_II_46_2_bit|DV_II_51_2_bit));
+	mask &= ((0-((W[40]^(W[41]>>5))&(1<<1))) | ~(DV_I_47_2_bit|DV_I_51_2_bit|DV_II_50_2_bit));
+	if (mask & (DV_I_44_0_bit|DV_I_46_0_bit|DV_II_56_0_bit))
+		mask &= ((0-(((W[40]^W[42])>>4)&1)) | ~(DV_I_44_0_bit|DV_I_46_0_bit|DV_II_56_0_bit));
+	mask &= ((0-((W[39]^(W[40]>>5))&(1<<1))) | ~(DV_I_46_2_bit|DV_I_50_2_bit|DV_II_49_2_bit));
+	if (mask & (DV_I_43_0_bit|DV_I_45_0_bit|DV_II_55_0_bit))
+		mask &= ((0-(((W[39]^W[41])>>4)&1)) | ~(DV_I_43_0_bit|DV_I_45_0_bit|DV_II_55_0_bit));
+	if (mask & (DV_I_44_0_bit|DV_II_54_0_bit|DV_II_56_0_bit))
+		mask &= ((0-(((W[38]^W[40])>>4)&1)) | ~(DV_I_44_0_bit|DV_II_54_0_bit|DV_II_56_0_bit));
+	if (mask & (DV_I_43_0_bit|DV_II_53_0_bit|DV_II_55_0_bit))
+		mask &= ((0-(((W[37]^W[39])>>4)&1)) | ~(DV_I_43_0_bit|DV_II_53_0_bit|DV_II_55_0_bit));
+	mask &= ((0-((W[36]^(W[37]>>5))&(1<<1))) | ~(DV_I_47_2_bit|DV_I_50_2_bit|DV_II_46_2_bit));
+	if (mask & (DV_I_45_0_bit|DV_I_48_0_bit|DV_II_47_0_bit))
+		mask &= (((((W[35]>>4)^(W[39]>>29))&1)-1) | ~(DV_I_45_0_bit|DV_I_48_0_bit|DV_II_47_0_bit));
+	if (mask & (DV_I_48_0_bit|DV_II_48_0_bit))
+		mask &= ((0-((W[63]^(W[64]>>5))&(1<<0))) | ~(DV_I_48_0_bit|DV_II_48_0_bit));
+	if (mask & (DV_I_45_0_bit|DV_II_45_0_bit))
+		mask &= ((0-((W[63]^(W[64]>>5))&(1<<1))) | ~(DV_I_45_0_bit|DV_II_45_0_bit));
+	if (mask & (DV_I_47_0_bit|DV_II_47_0_bit))
+		mask &= ((0-((W[62]^(W[63]>>5))&(1<<0))) | ~(DV_I_47_0_bit|DV_II_47_0_bit));
+	if (mask & (DV_I_46_0_bit|DV_II_46_0_bit))
+		mask &= ((0-((W[61]^(W[62]>>5))&(1<<0))) | ~(DV_I_46_0_bit|DV_II_46_0_bit));
+	mask &= ((0-((W[61]^(W[62]>>5))&(1<<2))) | ~(DV_I_46_2_bit|DV_II_46_2_bit));
+	if (mask & (DV_I_45_0_bit|DV_II_45_0_bit))
+		mask &= ((0-((W[60]^(W[61]>>5))&(1<<0))) | ~(DV_I_45_0_bit|DV_II_45_0_bit));
+	if (mask & (DV_II_51_0_bit|DV_II_54_0_bit))
+		mask &= (((((W[58]^W[59])>>29)&1)-1) | ~(DV_II_51_0_bit|DV_II_54_0_bit));
+	if (mask & (DV_II_50_0_bit|DV_II_53_0_bit))
+		mask &= (((((W[57]^W[58])>>29)&1)-1) | ~(DV_II_50_0_bit|DV_II_53_0_bit));
+	if (mask & (DV_II_52_0_bit|DV_II_54_0_bit))
+		mask &= ((((W[56]^(W[59]>>25))&(1<<4))-(1<<4)) | ~(DV_II_52_0_bit|DV_II_54_0_bit));
+	if (mask & (DV_II_51_0_bit|DV_II_52_0_bit))
+		mask &= ((0-(((W[56]^W[59])>>29)&1)) | ~(DV_II_51_0_bit|DV_II_52_0_bit));
+	if (mask & (DV_II_49_0_bit|DV_II_52_0_bit))
+		mask &= (((((W[56]^W[57])>>29)&1)-1) | ~(DV_II_49_0_bit|DV_II_52_0_bit));
+	if (mask & (DV_II_51_0_bit|DV_II_53_0_bit))
+		mask &= ((((W[55]^(W[58]>>25))&(1<<4))-(1<<4)) | ~(DV_II_51_0_bit|DV_II_53_0_bit));
+	if (mask & (DV_II_50_0_bit|DV_II_52_0_bit))
+		mask &= ((((W[54]^(W[57]>>25))&(1<<4))-(1<<4)) | ~(DV_II_50_0_bit|DV_II_52_0_bit));
+	if (mask & (DV_II_49_0_bit|DV_II_51_0_bit))
+		mask &= ((((W[53]^(W[56]>>25))&(1<<4))-(1<<4)) | ~(DV_II_49_0_bit|DV_II_51_0_bit));
+	mask &= ((((W[51]^(W[50]>>5))&(1<<1))-(1<<1)) | ~(DV_I_50_2_bit|DV_II_46_2_bit));
+	mask &= ((((W[48]^W[50])&(1<<6))-(1<<6)) | ~(DV_I_50_2_bit|DV_II_46_2_bit));
+	if (mask & (DV_I_51_0_bit|DV_I_52_0_bit))
+		mask &= ((0-(((W[48]^W[55])>>29)&1)) | ~(DV_I_51_0_bit|DV_I_52_0_bit));
+	mask &= ((((W[47]^W[49])&(1<<6))-(1<<6)) | ~(DV_I_49_2_bit|DV_I_51_2_bit));
+	mask &= ((((W[48]^(W[47]>>5))&(1<<1))-(1<<1)) | ~(DV_I_47_2_bit|DV_II_51_2_bit));
+	mask &= ((((W[46]^W[48])&(1<<6))-(1<<6)) | ~(DV_I_48_2_bit|DV_I_50_2_bit));
+	mask &= ((((W[47]^(W[46]>>5))&(1<<1))-(1<<1)) | ~(DV_I_46_2_bit|DV_II_50_2_bit));
+	mask &= ((0-((W[44]^(W[45]>>5))&(1<<1))) | ~(DV_I_51_2_bit|DV_II_49_2_bit));
+	mask &= ((((W[43]^W[45])&(1<<6))-(1<<6)) | ~(DV_I_47_2_bit|DV_I_49_2_bit));
+	mask &= (((((W[42]^W[44])>>6)&1)-1) | ~(DV_I_46_2_bit|DV_I_48_2_bit));
+	mask &= ((((W[43]^(W[42]>>5))&(1<<1))-(1<<1)) | ~(DV_II_46_2_bit|DV_II_51_2_bit));
+	mask &= ((((W[42]^(W[41]>>5))&(1<<1))-(1<<1)) | ~(DV_I_51_2_bit|DV_II_50_2_bit));
+	mask &= ((((W[41]^(W[40]>>5))&(1<<1))-(1<<1)) | ~(DV_I_50_2_bit|DV_II_49_2_bit));
+	if (mask & (DV_I_52_0_bit|DV_II_51_0_bit))
+		mask &= ((((W[39]^(W[43]>>25))&(1<<4))-(1<<4)) | ~(DV_I_52_0_bit|DV_II_51_0_bit));
+	if (mask & (DV_I_51_0_bit|DV_II_50_0_bit))
+		mask &= ((((W[38]^(W[42]>>25))&(1<<4))-(1<<4)) | ~(DV_I_51_0_bit|DV_II_50_0_bit));
+	if (mask & (DV_I_48_2_bit|DV_I_51_2_bit))
+		mask &= ((0-((W[37]^(W[38]>>5))&(1<<1))) | ~(DV_I_48_2_bit|DV_I_51_2_bit));
+	if (mask & (DV_I_50_0_bit|DV_II_49_0_bit))
+		mask &= ((((W[37]^(W[41]>>25))&(1<<4))-(1<<4)) | ~(DV_I_50_0_bit|DV_II_49_0_bit));
+	if (mask & (DV_II_52_0_bit|DV_II_54_0_bit))
+		mask &= ((0-((W[36]^W[38])&(1<<4))) | ~(DV_II_52_0_bit|DV_II_54_0_bit));
+	mask &= ((0-((W[35]^(W[36]>>5))&(1<<1))) | ~(DV_I_46_2_bit|DV_I_49_2_bit));
+	if (mask & (DV_I_51_0_bit|DV_II_47_0_bit))
+		mask &= ((((W[35]^(W[39]>>25))&(1<<3))-(1<<3)) | ~(DV_I_51_0_bit|DV_II_47_0_bit));
+if (mask) {
+
+	if (mask & DV_I_43_0_bit)
+		 if (
+			    !((W[61]^(W[62]>>5)) & (1<<1))
+			 || !(!((W[59]^(W[63]>>25)) & (1<<5)))
+			 || !((W[58]^(W[63]>>30)) & (1<<0))
+		 )  mask &= ~DV_I_43_0_bit;
+	if (mask & DV_I_44_0_bit)
+		 if (
+			    !((W[62]^(W[63]>>5)) & (1<<1))
+			 || !(!((W[60]^(W[64]>>25)) & (1<<5)))
+			 || !((W[59]^(W[64]>>30)) & (1<<0))
+		 )  mask &= ~DV_I_44_0_bit;
+	if (mask & DV_I_46_2_bit)
+		mask &= ((~((W[40]^W[42])>>2)) | ~DV_I_46_2_bit);
+	if (mask & DV_I_47_2_bit)
+		 if (
+			    !((W[62]^(W[63]>>5)) & (1<<2))
+			 || !(!((W[41]^W[43]) & (1<<6)))
+		 )  mask &= ~DV_I_47_2_bit;
+	if (mask & DV_I_48_2_bit)
+		 if (
+			    !((W[63]^(W[64]>>5)) & (1<<2))
+			 || !(!((W[48]^(W[49]<<5)) & (1<<6)))
+		 )  mask &= ~DV_I_48_2_bit;
+	if (mask & DV_I_49_2_bit)
+		 if (
+			    !(!((W[49]^(W[50]<<5)) & (1<<6)))
+			 || !((W[42]^W[50]) & (1<<1))
+			 || !(!((W[39]^(W[40]<<5)) & (1<<6)))
+			 || !((W[38]^W[40]) & (1<<1))
+		 )  mask &= ~DV_I_49_2_bit;
+	if (mask & DV_I_50_0_bit)
+		mask &= ((((W[36]^W[37])<<7)) | ~DV_I_50_0_bit);
+	if (mask & DV_I_50_2_bit)
+		mask &= ((((W[43]^W[51])<<11)) | ~DV_I_50_2_bit);
+	if (mask & DV_I_51_0_bit)
+		mask &= ((((W[37]^W[38])<<9)) | ~DV_I_51_0_bit);
+	if (mask & DV_I_51_2_bit)
+		 if (
+			    !(!((W[51]^(W[52]<<5)) & (1<<6)))
+			 || !(!((W[49]^W[51]) & (1<<6)))
+			 || !(!((W[37]^(W[37]>>5)) & (1<<1)))
+			 || !(!((W[35]^(W[39]>>25)) & (1<<5)))
+		 )  mask &= ~DV_I_51_2_bit;
+	if (mask & DV_I_52_0_bit)
+		mask &= ((((W[38]^W[39])<<11)) | ~DV_I_52_0_bit);
+	if (mask & DV_II_46_2_bit)
+		mask &= ((((W[47]^W[51])<<17)) | ~DV_II_46_2_bit);
+	if (mask & DV_II_48_0_bit)
+		 if (
+			    !(!((W[36]^(W[40]>>25)) & (1<<3)))
+			 || !((W[35]^(W[40]<<2)) & (1<<30))
+		 )  mask &= ~DV_II_48_0_bit;
+	if (mask & DV_II_49_0_bit)
+		 if (
+			    !(!((W[37]^(W[41]>>25)) & (1<<3)))
+			 || !((W[36]^(W[41]<<2)) & (1<<30))
+		 )  mask &= ~DV_II_49_0_bit;
+	if (mask & DV_II_49_2_bit)
+		 if (
+			    !(!((W[53]^(W[54]<<5)) & (1<<6)))
+			 || !(!((W[51]^W[53]) & (1<<6)))
+			 || !((W[50]^W[54]) & (1<<1))
+			 || !(!((W[45]^(W[46]<<5)) & (1<<6)))
+			 || !(!((W[37]^(W[41]>>25)) & (1<<5)))
+			 || !((W[36]^(W[41]>>30)) & (1<<0))
+		 )  mask &= ~DV_II_49_2_bit;
+	if (mask & DV_II_50_0_bit)
+		 if (
+			    !((W[55]^W[58]) & (1<<29))
+			 || !(!((W[38]^(W[42]>>25)) & (1<<3)))
+			 || !((W[37]^(W[42]<<2)) & (1<<30))
+		 )  mask &= ~DV_II_50_0_bit;
+	if (mask & DV_II_50_2_bit)
+		 if (
+			    !(!((W[54]^(W[55]<<5)) & (1<<6)))
+			 || !(!((W[52]^W[54]) & (1<<6)))
+			 || !((W[51]^W[55]) & (1<<1))
+			 || !((W[45]^W[47]) & (1<<1))
+			 || !(!((W[38]^(W[42]>>25)) & (1<<5)))
+			 || !((W[37]^(W[42]>>30)) & (1<<0))
+		 )  mask &= ~DV_II_50_2_bit;
+	if (mask & DV_II_51_0_bit)
+		 if (
+			    !(!((W[39]^(W[43]>>25)) & (1<<3)))
+			 || !((W[38]^(W[43]<<2)) & (1<<30))
+		 )  mask &= ~DV_II_51_0_bit;
+	if (mask & DV_II_51_2_bit)
+		 if (
+			    !(!((W[55]^(W[56]<<5)) & (1<<6)))
+			 || !(!((W[53]^W[55]) & (1<<6)))
+			 || !((W[52]^W[56]) & (1<<1))
+			 || !((W[46]^W[48]) & (1<<1))
+			 || !(!((W[39]^(W[43]>>25)) & (1<<5)))
+			 || !((W[38]^(W[43]>>30)) & (1<<0))
+		 )  mask &= ~DV_II_51_2_bit;
+	if (mask & DV_II_52_0_bit)
+		 if (
+			    !(!((W[59]^W[60]) & (1<<29)))
+			 || !(!((W[40]^(W[44]>>25)) & (1<<3)))
+			 || !(!((W[40]^(W[44]>>25)) & (1<<4)))
+			 || !((W[39]^(W[44]<<2)) & (1<<30))
+		 )  mask &= ~DV_II_52_0_bit;
+	if (mask & DV_II_53_0_bit)
+		 if (
+			    !((W[58]^W[61]) & (1<<29))
+			 || !(!((W[57]^(W[61]>>25)) & (1<<4)))
+			 || !(!((W[41]^(W[45]>>25)) & (1<<3)))
+			 || !(!((W[41]^(W[45]>>25)) & (1<<4)))
+		 )  mask &= ~DV_II_53_0_bit;
+	if (mask & DV_II_54_0_bit)
+		 if (
+			    !(!((W[58]^(W[62]>>25)) & (1<<4)))
+			 || !(!((W[42]^(W[46]>>25)) & (1<<3)))
+			 || !(!((W[42]^(W[46]>>25)) & (1<<4)))
+		 )  mask &= ~DV_II_54_0_bit;
+	if (mask & DV_II_55_0_bit)
+		 if (
+			    !(!((W[59]^(W[63]>>25)) & (1<<4)))
+			 || !(!((W[57]^(W[59]>>25)) & (1<<4)))
+			 || !(!((W[43]^(W[47]>>25)) & (1<<3)))
+			 || !(!((W[43]^(W[47]>>25)) & (1<<4)))
+		 )  mask &= ~DV_II_55_0_bit;
+	if (mask & DV_II_56_0_bit)
+		 if (
+			    !(!((W[60]^(W[64]>>25)) & (1<<4)))
+			 || !(!((W[44]^(W[48]>>25)) & (1<<3)))
+			 || !(!((W[44]^(W[48]>>25)) & (1<<4)))
+		 )  mask &= ~DV_II_56_0_bit;
+}
+
+	dvmask[0]=mask;
+}
diff --git a/sha1dc/ubc_check.h b/sha1dc/ubc_check.h
new file mode 100644
index 000000000..27285bdf5
--- /dev/null
+++ b/sha1dc/ubc_check.h
@@ -0,0 +1,35 @@
+/***
+* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow <danshu@microsoft.com>
+* Distributed under the MIT Software License.
+* See accompanying file LICENSE.txt or copy at
+* https://opensource.org/licenses/MIT
+***/
+
+// this file was generated by the 'parse_bitrel' program in the tools section
+// using the data files from directory 'tools/data/3565'
+//
+// sha1_dvs contains a list of SHA-1 Disturbance Vectors (DV) to check
+// dvType, dvK and dvB define the DV: I(K,B) or II(K,B) (see the paper)
+// dm[80] is the expanded message block XOR-difference defined by the DV
+// testt is the step to do the recompression from for collision detection
+// maski and maskb define the bit to check for each DV in the dvmask returned by ubc_check
+//
+// ubc_check takes as input an expanded message block and verifies the unavoidable bitconditions for all listed DVs
+// it returns a dvmask where each bit belonging to a DV is set if all unavoidable bitconditions for that DV have been met
+// thus one needs to do the recompression check for each DV that has its bit set
+
+#ifndef UBC_CHECK_H
+#define UBC_CHECK_H
+
+#include <stdint.h>
+
+#define DVMASKSIZE 1
+typedef struct { int dvType; int dvK; int dvB; int testt; int maski; int maskb; uint32_t dm[80]; } dv_info_t;
+extern dv_info_t sha1_dvs[];
+void ubc_check(const uint32_t W[80], uint32_t dvmask[DVMASKSIZE]);
+
+#define DOSTORESTATE58
+#define DOSTORESTATE65
+
+
+#endif // UBC_CHECK_H
-- 
2.12.0.rc2.629.ga7951ed82


^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [PATCH 2/3] sha1dc: adjust header includes for git
  2017-02-23 23:05                         ` Jeff King
  2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
@ 2017-02-23 23:05                           ` Jeff King
  2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
                                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-02-23 23:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

We can replace system includes with git-compat-util.h (and
should make sure it is included in all .c files). We can
drop includes from headers entirely, as every .c file is
supposed to include git-compat-util itself first.

We also use the full "sha1dc/" path for including related
files. This isn't strictly necessary, but makes the expected
resolution more obvious.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1dc/sha1.c      | 9 +++------
 sha1dc/sha1.h      | 2 --
 sha1dc/ubc_check.c | 4 ++--
 sha1dc/ubc_check.h | 2 --
 4 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index ed2010911..762c6fff8 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -5,12 +5,9 @@
 * https://opensource.org/licenses/MIT
 ***/
 
-#include <string.h>
-#include <memory.h>
-#include <stdio.h>
-
-#include "sha1.h"
-#include "ubc_check.h"
+#include "git-compat-util.h"
+#include "sha1dc/sha1.h"
+#include "sha1dc/ubc_check.h"
 
 #define rotate_right(x,n) (((x)>>(n))|((x)<<(32-(n))))
 #define rotate_left(x,n)  (((x)<<(n))|((x)>>(32-(n))))
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index 8b522f9d2..ce5390397 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -5,8 +5,6 @@
 * https://opensource.org/licenses/MIT
 ***/
 
-#include <stdint.h>
-
 // uses SHA-1 message expansion to expand the first 16 words of W[] to 80 words
 void sha1_message_expansion(uint32_t W[80]);
 
diff --git a/sha1dc/ubc_check.c b/sha1dc/ubc_check.c
index 556aaf3c5..6bccd4f2b 100644
--- a/sha1dc/ubc_check.c
+++ b/sha1dc/ubc_check.c
@@ -22,8 +22,8 @@
 // a directly verifiable version named ubc_check_verify can be found in ubc_check_verify.c
 // ubc_check has been verified against ubc_check_verify using the 'ubc_check_test' program in the tools section
 
-#include <stdint.h>
-#include "ubc_check.h"
+#include "git-compat-util.h"
+#include "sha1dc/ubc_check.h"
 
 static const uint32_t DV_I_43_0_bit 	= (uint32_t)(1) << 0;
 static const uint32_t DV_I_44_0_bit 	= (uint32_t)(1) << 1;
diff --git a/sha1dc/ubc_check.h b/sha1dc/ubc_check.h
index 27285bdf5..05ff944eb 100644
--- a/sha1dc/ubc_check.h
+++ b/sha1dc/ubc_check.h
@@ -21,8 +21,6 @@
 #ifndef UBC_CHECK_H
 #define UBC_CHECK_H
 
-#include <stdint.h>
-
 #define DVMASKSIZE 1
 typedef struct { int dvType; int dvK; int dvB; int testt; int maski; int maskb; uint32_t dm[80]; } dv_info_t;
 extern dv_info_t sha1_dvs[];
-- 
2.12.0.rc2.629.ga7951ed82


^ permalink raw reply related	[flat|nested] 136+ messages in thread

* [PATCH 3/3] Makefile: add USE_SHA1DC knob
  2017-02-23 23:05                         ` Jeff King
  2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
  2017-02-23 23:05                           ` [PATCH 2/3] sha1dc: adjust header includes for git Jeff King
@ 2017-02-23 23:06                           ` Jeff King
  2017-02-24 18:36                             ` HW42
  2017-02-23 23:14                           ` SHA1 collisions found Linus Torvalds
  2017-02-28 18:41                           ` Junio C Hamano
  4 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-23 23:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joey Hess, Git Mailing List

This knob lets you use the sha1dc implementation from:

      https://github.com/cr-marcstevens/sha1collisiondetection

which can detect certain types of collision attacks (even
when we only see half of the colliding pair).

The big downside is that it's slower than either the openssl
or block-sha1 implementations.

Here are some timings based off of linux.git:

  - compute sha1 over whole packfile
    before: 1.349s
     after: 5.067s
    change: +275%

  - rev-list --all
    before: 5.742s
     after: 5.730s
    change: -0.2%

  - rev-list --all --objects
    before: 33.257s
     after: 33.392s
    change: +0.4%

  - index-pack --verify
    before: 2m20s
     after: 5m43s
    change: +145%

  - git log --no-merges -10000 -p
    before: 9.532s
     after: 9.683s
    change: +1.5%

So overall the sha1 computation is about 3-4x slower. But of
course most operations do more than just sha1. Accessing
commits and trees isn't slowed at all (both the +/- changes
there are well within the run-to-run noise). Accessing the
blobs is a little slower, but mostly drowned out by the cost
of things like actually generating patches.

The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. It's
a bit worse than twice as slow, which means every push and
every fetch is going to experience that.

Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile      | 10 ++++++++++
 sha1dc/sha1.c | 22 ++++++++++++++++++++++
 sha1dc/sha1.h | 16 ++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/Makefile b/Makefile
index 8e4081e06..7c4906250 100644
--- a/Makefile
+++ b/Makefile
@@ -142,6 +142,10 @@ all::
 # Define PPC_SHA1 environment variable when running make to make use of
 # a bundled SHA1 routine optimized for PowerPC.
 #
+# Define USE_SHA1DC to unconditionally enable the collision-detecting sha1
+# algorithm. This is slower, but may detect attempted collision attacks.
+# Takes priority over other *_SHA1 knobs.
+#
 # Define SHA1_MAX_BLOCK_SIZE to limit the amount of data that will be hashed
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
@@ -1386,6 +1390,11 @@ ifdef APPLE_COMMON_CRYPTO
 	SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L
 endif
 
+ifdef USE_SHA1DC
+	SHA1_HEADER = "sha1dc/sha1.h"
+	LIB_OBJS += sha1dc/sha1.o
+	LIB_OBJS += sha1dc/ubc_check.o
+else
 ifdef BLK_SHA1
 	SHA1_HEADER = "block-sha1/sha1.h"
 	LIB_OBJS += block-sha1/sha1.o
@@ -1403,6 +1412,7 @@ else
 endif
 endif
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index 762c6fff8..1566ec4c7 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -1141,3 +1141,25 @@ int SHA1DCFinal(unsigned char output[20], SHA1_CTX *ctx)
 	output[19] = (unsigned char)(ctx->ihv[4]);
 	return ctx->found_collision;
 }
+
+static const char collision_message[] =
+"The SHA1 computation detected evidence of a collision attack;\n"
+"refusing to process the contents.";
+
+void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
+{
+	if (SHA1DCFinal(hash, ctx))
+		die(collision_message);
+}
+
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+{
+	const char *data = vdata;
+	/* We expect an unsigned long, but sha1dc only takes an int */
+	while (len > INT_MAX) {
+		SHA1DCUpdate(ctx, data, INT_MAX);
+		data += INT_MAX;
+		len -= INT_MAX;
+	}
+	SHA1DCUpdate(ctx, data, len);
+}
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index ce5390397..1bb0ace99 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -90,3 +90,19 @@ void SHA1DCUpdate(SHA1_CTX*, const char*, unsigned);
 // obtain SHA-1 hash from SHA-1 context
 // returns: 0 = no collision detected, otherwise = collision found => warn user for active attack
 int  SHA1DCFinal(unsigned char[20], SHA1_CTX*); 
+
+
+/*
+ * Same as SHA1DCFinal, but convert collision attack case into a verbose die().
+ */
+void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
+
+/*
+ * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
+ */
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+
+#define platform_SHA_CTX SHA1_CTX
+#define platform_SHA1_Init SHA1DCInit
+#define platform_SHA1_Update git_SHA1DCUpdate
+#define platform_SHA1_Final git_SHA1DCFinal
-- 
2.12.0.rc2.629.ga7951ed82

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 23:05                         ` Jeff King
                                             ` (2 preceding siblings ...)
  2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
@ 2017-02-23 23:14                           ` Linus Torvalds
  2017-02-28 18:41                           ` Junio C Hamano
  4 siblings, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-23 23:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 3:05 PM, Jeff King <peff@peff.net> wrote:
>
> (By the way, I don't see your version on the list, Linus, which probably
> means it was eaten by the 100K filter).

Ahh. I didn't even think about a size filter.

Doesn't matter, your version looks fine.

           Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 1/3] add collision-detecting sha1 implementation
  2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
@ 2017-02-23 23:15                             ` Stefan Beller
  2017-02-24  0:01                               ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Stefan Beller @ 2017-02-23 23:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 3:05 PM, Jeff King <peff@peff.net> wrote:

> +* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow (danshu@microsoft.com)
> +* Distributed under the MIT Software License.
> +* See accompanying file LICENSE.txt or copy at

The accompanying LICENSE file did not make it into this patch,
that is more specialized/verbose than the one at
https://opensource.org/licenses/MIT
w.r.t. copyright notice requirement.

Apart from that MIT seems to be compatible with GPL
according to the FSF, though IANAL.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 1/3] add collision-detecting sha1 implementation
  2017-02-23 23:15                             ` Stefan Beller
@ 2017-02-24  0:01                               ` Jeff King
  2017-02-24  0:12                                 ` Linus Torvalds
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-24  0:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 03:15:11PM -0800, Stefan Beller wrote:

> On Thu, Feb 23, 2017 at 3:05 PM, Jeff King <peff@peff.net> wrote:
> 
> > +* Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow (danshu@microsoft.com)
> > +* Distributed under the MIT Software License.
> > +* See accompanying file LICENSE.txt or copy at
> 
> The accompanying LICENSE file did not make it into this patch,
> that is more specialized/verbose than the one at
> https://opensource.org/licenses/MIT
> w.r.t. copyright notice requirement.

You know, I didn't even look at the LICENSE file, since it said MIT and
had a link here. It would be trivial to copy it over, too, of course.

> Apart from that MIT seems to be compatible with GPL
> according to the FSF, though IANAL.

Yeah, that's always been my understanding.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 1/3] add collision-detecting sha1 implementation
  2017-02-24  0:01                               ` Jeff King
@ 2017-02-24  0:12                                 ` Linus Torvalds
  2017-02-24  0:16                                   ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-24  0:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 4:01 PM, Jeff King <peff@peff.net> wrote:
>
> You know, I didn't even look at the LICENSE file, since it said MIT and
> had a link here. It would be trivial to copy it over, too, of course.

You should do it. It's just good to be careful and clear with
licenses, and the license text does require that the copyright notice
and permission file should be included in copies.

My patch did it. "Pats self on head".

             Linus

PS. And just to be polite, we should probably also just cc at least
Marc Stevens and Dan Shumow if we take that patch further. Their email
addresses are in the that LICENSE.txt file.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 1/3] add collision-detecting sha1 implementation
  2017-02-24  0:12                                 ` Linus Torvalds
@ 2017-02-24  0:16                                   ` Jeff King
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-02-24  0:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stefan Beller, Joey Hess, Git Mailing List

On Thu, Feb 23, 2017 at 04:12:01PM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 4:01 PM, Jeff King <peff@peff.net> wrote:
> >
> > You know, I didn't even look at the LICENSE file, since it said MIT and
> > had a link here. It would be trivial to copy it over, too, of course.
> 
> You should do it. It's just good to be careful and clear with
> licenses, and the license text does require that the copyright notice
> and permission file should be included in copies.
> 
> My patch did it. "Pats self on head".

And that's why yours crossed the 100K barrier. :)

But yeah, I agree it is better to be safe (and that's we should contact
the authors). I'll point them out-of-band to this thread, and cc them if
it ends up being re-rolled.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
                   ` (2 preceding siblings ...)
  2017-02-23 17:19 ` Linus Torvalds
@ 2017-02-24  9:42 ` Duy Nguyen
  2017-02-25 19:04   ` brian m. carlson
  2017-02-24 15:13 ` Ian Jackson
  2017-02-24 22:47 ` Jakub Narębski
  5 siblings, 1 reply; 136+ messages in thread
From: Duy Nguyen @ 2017-02-24  9:42 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List

On Thu, Feb 23, 2017 at 11:43 PM, Joey Hess <id@joeyh.name> wrote:
> IIRC someone has been working on parameterizing git's SHA1 assumptions
> so a repository could eventually use a more secure hash. How far has
> that gotten? There are still many "40" constants in git.git HEAD.

Michael asked Brian (that "someone") the other day and he replied [1]

>> I'm curious; what fraction of the overall convert-to-object_id campaign
>> do you estimate is done so far? Are you getting close to the promised
>> land yet?
>
> So I think that the current scope left is best estimated by the
> following command:
>
>   git grep -P 'unsigned char\s+(\*|.*20)' | grep -v '^Documentation'
>
> So there are approximately 1200 call sites left, which is quite a bit of
> work.  I estimate between the work I've done and other people's
> refactoring work (such as the refs backend refactor), we're about 40%
> done.

[1] http://public-inbox.org/git/%3C20170217214513.giua5ksuiqqs2laj@genre.crustytoothpaste.net%3E/
-- 
Duy

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
                   ` (3 preceding siblings ...)
  2017-02-24  9:42 ` Duy Nguyen
@ 2017-02-24 15:13 ` Ian Jackson
  2017-02-24 17:04   ` ankostis
                     ` (2 more replies)
  2017-02-24 22:47 ` Jakub Narębski
  5 siblings, 3 replies; 136+ messages in thread
From: Ian Jackson @ 2017-02-24 15:13 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess writes ("SHA1 collisions found"):
> https://shattered.io/static/shattered.pdf
> https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
> 
> IIRC someone has been working on parameterizing git's SHA1 assumptions
> so a repository could eventually use a more secure hash. How far has
> that gotten? There are still many "40" constants in git.git HEAD.

I have been thinking about how to do a transition from SHA1 to another
hash function.

I have concluded that:

 * We can should avoid expecting everyone to rewrite all their
   history.

 * Unfortunately, because the data formats (particularly, the commit
   header) are not in practice extensible (because of the way existing
   code parses them), it is not useful to try generate new data (new
   commits etc.) containing both new hashes and old hashes: old
   clients will mishandle the new data.

 * Therefore the transition needs to be done by giving every object
   two names (old and new hash function).  Objects may refer to each
   other by either name, but must pick one.  The usual shape of
   project histories will be a pile of old commits referring to each
   other by old names, surmounted by new commits referrring to each
   other by new names.

 * It is not possible to solve this problem without extending the
   object name format.  Therefore all software which calls git and
   expects to handle object names will need to be updated.

I have been writing a more detailed transition plan.  I hope to post
this within a few days.

Ian.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 19:13           ` Morten Welinder
@ 2017-02-24 15:52             ` Geert Uytterhoeven
  0 siblings, 0 replies; 136+ messages in thread
From: Geert Uytterhoeven @ 2017-02-24 15:52 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Joey Hess, Linus Torvalds, Git Mailing List

On Thu, Feb 23, 2017 at 8:13 PM, Morten Welinder <mwelinder@gmail.com> wrote:
> The attack seems to generate two 64-bytes blocks, one quarter of which
> is repeated data.  (Table-1 in the paper.)
>
> Assuming the result of that is evenly distributed and that bytes are
> independent, we can estimate the chances that the result is NUL-free
> as (255/256)^192 = 47% and the probability that the result is NUL and
> newline free as (254/256)^192 = 22%.  Clearly one should not rely of
> NULs or newlines to save the day.  On  the other hand, the chances of
> an ascii result is something like (95/256)^192 = 10^-83.

Good. So they can replace linux/Documentation/logo.gif, but not actual source
files, not even if they contain hex arrays with "device parameters" ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 15:13 ` Ian Jackson
@ 2017-02-24 17:04   ` ankostis
  2017-02-24 17:23   ` Jason Cooper
  2017-02-24 17:32   ` Junio C Hamano
  2 siblings, 0 replies; 136+ messages in thread
From: ankostis @ 2017-02-24 17:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: git

On 24 February 2017 at 16:13, Ian Jackson
<ijackson@chiark.greenend.org.uk> wrote:
>
> Joey Hess writes ("SHA1 collisions found"):
> > https://shattered.io/static/shattered.pdf
> > https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
> >
> > IIRC someone has been working on parameterizing git's SHA1 assumptions
> > so a repository could eventually use a more secure hash. How far has
> > that gotten? There are still many "40" constants in git.git HEAD.
>
> I have been thinking about how to do a transition from SHA1 to another
> hash function.
>
> I have concluded that:
>
>  * We can should avoid expecting everyone to rewrite all their
>    history.
>
>  * Unfortunately, because the data formats (particularly, the commit
>    header) are not in practice extensible (because of the way existing
>    code parses them), it is not useful to try generate new data (new
>    commits etc.) containing both new hashes and old hashes: old
>    clients will mishandle the new data.
>
>  * Therefore the transition needs to be done by giving every object
>    two names (old and new hash function).  Objects may refer to each
>    other by either name, but must pick one.  The usual shape of
>    project histories will be a pile of old commits referring to each
>    other by old names, surmounted by new commits referrring to each
>    other by new names.
>
>  * It is not possible to solve this problem without extending the
>    object name format.  Therefore all software which calls git and
>    expects to handle object names will need to be updated.
>
> I have been writing a more detailed transition plan.  I hope to post
> this within a few days.

It would be great to have a rough plan of the transition to a new hash
function.

We are writing a git-based application to store electronic-files for
legislative purposes for EU.
And one of the great questions we face is about git's SHA-1 validity in 5
or 20 years of time from now.

Is it possible to have an assessment of the situation for this transition?

Best regards for your efforts,
  Kostis

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 15:13 ` Ian Jackson
  2017-02-24 17:04   ` ankostis
@ 2017-02-24 17:23   ` Jason Cooper
  2017-02-25 23:22     ` ankostis
  2017-02-24 17:32   ` Junio C Hamano
  2 siblings, 1 reply; 136+ messages in thread
From: Jason Cooper @ 2017-02-24 17:23 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Joey Hess, git

Hi Ian,

On Fri, Feb 24, 2017 at 03:13:37PM +0000, Ian Jackson wrote:
> Joey Hess writes ("SHA1 collisions found"):
> > https://shattered.io/static/shattered.pdf
> > https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
> > 
> > IIRC someone has been working on parameterizing git's SHA1 assumptions
> > so a repository could eventually use a more secure hash. How far has
> > that gotten? There are still many "40" constants in git.git HEAD.
> 
> I have been thinking about how to do a transition from SHA1 to another
> hash function.
> 
> I have concluded that:
> 
>  * We can should avoid expecting everyone to rewrite all their
>    history.

Agreed.

>  * Unfortunately, because the data formats (particularly, the commit
>    header) are not in practice extensible (because of the way existing
>    code parses them), it is not useful to try generate new data (new
>    commits etc.) containing both new hashes and old hashes: old
>    clients will mishandle the new data.

My thought here is:

 a) re-hash blobs with sha256, hardlink to sha1 objects
 b) create new tree objects which are mirrors of each sha1 tree object,
    but purely sha256
 c) mirror commits, but they are also purely sha256
 d) future PGP signed tags would sign both hashes (or include both?)

Which would end up something like:

  .git/
    \... #usual files
    \objects
      \ef
        \3c39f7522dc55a24f64da9febcfac71e984366
    \objects-sha2_256
      \72
        \604fd2de5f25c89d692b01081af93bcf00d2af34549d8d1bdeb68bc048932
    \info
      \...
    \info-sha2_256
      \refs #uses sha256 commit identifiers

Basically, keep the sha256 stuff out of the way for legacy clients, and
new clients will still be able to use it.

There shouldn't be a need to re-sign old signed tags if the underlying
objects are counter-hashed.  There might need to be some transition
info, though.

Say a new client does 'git tag -v tags/v3.16' in the kernel tree.  I would
expect it to check the sha1 hashes, verify the PGP signed tag, and then
also check the sha256 counter-hashes of the relevant objects.

thx,

Jason.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 15:13 ` Ian Jackson
  2017-02-24 17:04   ` ankostis
  2017-02-24 17:23   ` Jason Cooper
@ 2017-02-24 17:32   ` Junio C Hamano
  2017-02-24 17:45     ` David Lang
                       ` (3 more replies)
  2 siblings, 4 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-24 17:32 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Joey Hess, git

Ian Jackson <ijackson@chiark.greenend.org.uk> writes:

> I have been thinking about how to do a transition from SHA1 to another
> hash function.

Good.  I think many of us have also been, too, not necessarily just
in the past few days in response to shattered, but over the last 10
years, yet without coming to a consensus design ;-)

> I have concluded that:
>
>  * We can should avoid expecting everyone to rewrite all their
>    history.

Yes.

>  * Unfortunately, because the data formats (particularly, the commit
>    header) are not in practice extensible (because of the way existing
>    code parses them), it is not useful to try generate new data (new
>    commits etc.) containing both new hashes and old hashes: old
>    clients will mishandle the new data.

Yes.

>  * Therefore the transition needs to be done by giving every object
>    two names (old and new hash function).  Objects may refer to each
>    other by either name, but must pick one.  The usual shape of

I do not think it is necessrily so.  Existing code may not be able
to read anything new, but you can make the new code understand
object names in both formats, and for a smooth transition, I think
the new code needs to.

For example, a new commit that records a merge of an old and a new
commit whose resulting tree happens to be the same as the tree of
the old commit may begin like so:

    tree 21b97d4c4f968d1335f16292f954dfdbb91353f0
    parent 20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
    parent 22af6fef9b6538c9e87e147a920be9509acf1ddd

naming the only object whose name was done with new hash with the
new longer hash, while recording the names of the other existing
objects with SHA-1.  We would need to extend the object format for
tag (which would be trivial as the object reference is textual and
similar to a commit) and tree (much harder), of course.

As long as the reader can tell from the format of object names
stored in the "new object format" object from what era is being
referred to in some way [*1*], we can name new objects with only new
hash, I would think.  "new refers only to new" that stratifies
objects into older and newer may make things simpler, but I am not
convinced yet that it would give our users a smooth enough
transition path (but I am open to be educated and pursuaded the
other way).

[Footnote]

*1* In the above toy example, length being 40 vs 64 is used as a
    sign between SHA-1 and the new hash, and careful readers may
    wonder if we should use sha-3,20769079d22... or something like
    that that more explicity identifies what hash is used, so that
    we can pick a hash whose length is 64 when we transition again.

    I personally do not think such a prefix is necessary during the
    first transition; we will likely to adopt a new hash again, and
    at that point that third one can have a prefix to differenciate
    it from the second one.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:32   ` Junio C Hamano
@ 2017-02-24 17:45     ` David Lang
  2017-02-24 18:14       ` Junio C Hamano
  2017-02-24 23:39     ` Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 136+ messages in thread
From: David Lang @ 2017-02-24 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ian Jackson, Joey Hess, git

On Fri, 24 Feb 2017, Junio C Hamano wrote:

> *1* In the above toy example, length being 40 vs 64 is used as a
>    sign between SHA-1 and the new hash, and careful readers may
>    wonder if we should use sha-3,20769079d22... or something like
>    that that more explicity identifies what hash is used, so that
>    we can pick a hash whose length is 64 when we transition again.
>
>    I personally do not think such a prefix is necessary during the
>    first transition; we will likely to adopt a new hash again, and
>    at that point that third one can have a prefix to differenciate
>    it from the second one.

as the saying goes "in computer science the interesting numbers are 0, 1, and 
many", does it really simplify things much to support 2 hashes vs supporting 
more so that this issue doesn't have to be revisited? (other than selecting new 
hashes over time)

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:45     ` David Lang
@ 2017-02-24 18:14       ` Junio C Hamano
  2017-02-24 18:58         ` Stefan Beller
  0 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2017-02-24 18:14 UTC (permalink / raw)
  To: David Lang; +Cc: Ian Jackson, Joey Hess, git

David Lang <david@lang.hm> writes:

> On Fri, 24 Feb 2017, Junio C Hamano wrote:
>
>> *1* In the above toy example, length being 40 vs 64 is used as a
>>    sign between SHA-1 and the new hash, and careful readers may
>>    wonder if we should use sha-3,20769079d22... or something like
>>    that that more explicity identifies what hash is used, so that
>>    we can pick a hash whose length is 64 when we transition again.
>>
>>    I personally do not think such a prefix is necessary during the
>>    first transition; we will likely to adopt a new hash again, and
>>    at that point that third one can have a prefix to differenciate
>>    it from the second one.
>
> as the saying goes "in computer science the interesting numbers are 0,
> 1, and many", does it really simplify things much to support 2 hashes
> vs supporting more so that this issue doesn't have to be revisited?
> (other than selecting new hashes over time)

It seems that I wasn't clear enough, perhaps?  The scheme I outlined
does not have to revisit this issue at all.  It already declares what
you need to do when you add the third one.  

If it is not 40 or 64 bytes long, you just write it out.  If it is
one of these length, then you add some identifying prefix or
postfix.  IOW, if the second one is sha-3 and the third one is blake
(both used at 256-bit), then we would have three kinds of names,
written like so:

    20769079d22a9f8010232bdf6131918c33a1bf69
    20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
    3,20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69

and the readers can well tell that the first one, being 40-chars
long, is SHA-1, the second one, being 64-chars long, is SHA-3, and
the last one, with the prefix '3' (only because that is the third
one officially supported by Git) and being 64-chars long, is blake,
for example.

I do not particularly care if it is prefix or postfix or something
else.  A not-so-well-hidden agenda is to avoid inviting people into
thinking that they can use their choice of random hash functions and
and claim that their hacked version is still a Git, as long as they
follow the object naming convention.  IOW, if you said something
like:

 * 40-hex is SHA-1 for historical reasons;
 * Others use hash-name, colon, and then N-hex.

you are inviting people to start using

    md5,54ddf8d47340e048166c45f439ce65fd

as object names.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 3/3] Makefile: add USE_SHA1DC knob
  2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
@ 2017-02-24 18:36                             ` HW42
  2017-02-24 18:57                               ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: HW42 @ 2017-02-24 18:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Joey Hess, Git Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1226 bytes --]

Jeff King:
> diff --git a/Makefile b/Makefile
> index 8e4081e06..7c4906250 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1386,6 +1390,11 @@ ifdef APPLE_COMMON_CRYPTO
>  	SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L
>  endif
>  
> +ifdef USE_SHA1DC
> +	SHA1_HEADER = "sha1dc/sha1.h"
> +	LIB_OBJS += sha1dc/sha1.o
> +	LIB_OBJS += sha1dc/ubc_check.o
> +else
>  ifdef BLK_SHA1
>  	SHA1_HEADER = "block-sha1/sha1.h"
>  	LIB_OBJS += block-sha1/sha1.o
> @@ -1403,6 +1412,7 @@ else
>  endif
>  endif
>  endif
> +endif

This sets SHA1_MAX_BLOCK_SIZE and the compiler flags for Apple
CommonCrypto even if the user selects USE_SHA1DC. The same happens for
BLK_SHA1. Is this intended?

> +void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
> +{
> +	const char *data = vdata;
> +	/* We expect an unsigned long, but sha1dc only takes an int */
> +	while (len > INT_MAX) {
> +		SHA1DCUpdate(ctx, data, INT_MAX);
> +		data += INT_MAX;
> +		len -= INT_MAX;
> +	}
> +	SHA1DCUpdate(ctx, data, len);
> +}

I think you can simply change the len parameter from unsigned into
size_t (or unsigned long) in SHA1DCUpdate().
https://github.com/cr-marcstevens/sha1collisiondetection/pull/6


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: [PATCH 3/3] Makefile: add USE_SHA1DC knob
  2017-02-24 18:36                             ` HW42
@ 2017-02-24 18:57                               ` Jeff King
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-02-24 18:57 UTC (permalink / raw)
  To: HW42; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

On Fri, Feb 24, 2017 at 06:36:00PM +0000, HW42 wrote:

> > +ifdef USE_SHA1DC
> > +	SHA1_HEADER = "sha1dc/sha1.h"
> > +	LIB_OBJS += sha1dc/sha1.o
> > +	LIB_OBJS += sha1dc/ubc_check.o
> > +else
> >  ifdef BLK_SHA1
> >  	SHA1_HEADER = "block-sha1/sha1.h"
> >  	LIB_OBJS += block-sha1/sha1.o
> > @@ -1403,6 +1412,7 @@ else
> >  endif
> >  endif
> >  endif
> > +endif
> 
> This sets SHA1_MAX_BLOCK_SIZE and the compiler flags for Apple
> CommonCrypto even if the user selects USE_SHA1DC. The same happens for
> BLK_SHA1. Is this intended?

No, it's not. I suspect that setting BLK_SHA1 has the same problem in
the current code, then.

> > +void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
> > +{
> > +	const char *data = vdata;
> > +	/* We expect an unsigned long, but sha1dc only takes an int */
> > +	while (len > INT_MAX) {
> > +		SHA1DCUpdate(ctx, data, INT_MAX);
> > +		data += INT_MAX;
> > +		len -= INT_MAX;
> > +	}
> > +	SHA1DCUpdate(ctx, data, len);
> > +}
> 
> I think you can simply change the len parameter from unsigned into
> size_t (or unsigned long) in SHA1DCUpdate().
> https://github.com/cr-marcstevens/sha1collisiondetection/pull/6

Yeah, I agree that is a cleaner solution. My focus was on changing the
(presumably tested) sha1dc code as little as possible.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 18:14       ` Junio C Hamano
@ 2017-02-24 18:58         ` Stefan Beller
  2017-02-24 19:20           ` Junio C Hamano
  2017-02-24 20:33           ` Philip Oakley
  0 siblings, 2 replies; 136+ messages in thread
From: Stefan Beller @ 2017-02-24 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Lang, Ian Jackson, Joey Hess, git@vger.kernel.org

On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano <gitster@pobox.com> wrote:

> you are inviting people to start using
>
>     md5,54ddf8d47340e048166c45f439ce65fd
>
> as object names.

which might even be okay for specific subsets of operations.
(e.g. all local work including staging things, making local "fixup" commits)

The addressing scheme should not be too hardcoded, we should rather
treat it similar to the cipher schemes in pgp. The additional complexity that
we have is the longevity of existence of things, though.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 18:58         ` Stefan Beller
@ 2017-02-24 19:20           ` Junio C Hamano
  2017-02-24 20:05             ` ankostis
  2017-02-24 20:05             ` SHA1 collisions found Junio C Hamano
  2017-02-24 20:33           ` Philip Oakley
  1 sibling, 2 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-24 19:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: David Lang, Ian Jackson, Joey Hess, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> you are inviting people to start using
>>
>>     md5,54ddf8d47340e048166c45f439ce65fd
>>
>> as object names.
>
> which might even be okay for specific subsets of operations.
> (e.g. all local work including staging things, making local "fixup" commits)
>
> The addressing scheme should not be too hardcoded, we should rather
> treat it similar to the cipher schemes in pgp. The additional complexity that
> we have is the longevity of existence of things, though.

The not-so-well-hidden agenda was exactly that we _SHOULD_ not
mimick PGP.  They do not have a requirement to encourage everybody
to use the same thing because each message is encrypted/signed
independently, i.e. they do not have to chain things like we do.


^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 19:20           ` Junio C Hamano
@ 2017-02-24 20:05             ` ankostis
  2017-02-24 20:32               ` Junio C Hamano
  2017-02-24 20:05             ` SHA1 collisions found Junio C Hamano
  1 sibling, 1 reply; 136+ messages in thread
From: ankostis @ 2017-02-24 20:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, David Lang, Ian Jackson, Joey Hess,
	git@vger.kernel.org

On 24 February 2017 at 20:20, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> you are inviting people to start using
>>>
>>>     md5,54ddf8d47340e048166c45f439ce65fd
>>>
>>> as object names.
>>
>> which might even be okay for specific subsets of operations.
>> (e.g. all local work including staging things, making local "fixup" commits)
>>
>> The addressing scheme should not be too hardcoded, we should rather
>> treat it similar to the cipher schemes in pgp. The additional complexity that
>> we have is the longevity of existence of things, though.
>
> The not-so-well-hidden agenda was exactly that we _SHOULD_ not
> mimick PGP.  They do not have a requirement to encourage everybody
> to use the same thing because each message is encrypted/signed
> independently, i.e. they do not have to chain things like we do.

But there is a scenario where supporting more hashes, in parallel, is
beneficial:

Let's assume that git is retroffited to always support the "default"
SHA-3, but support additionally more hash-funcs.
If in the future SHA-3 also gets defeated, it would be highly unlikely
that the same math would also break e.g. Blake.
So certain high-profile repos might choose for extra security 2 or more hashes.

Apologies if I'm misusing the list,
  Kostis

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 19:20           ` Junio C Hamano
  2017-02-24 20:05             ` ankostis
@ 2017-02-24 20:05             ` Junio C Hamano
  1 sibling, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-24 20:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: David Lang, Ian Jackson, Joey Hess, git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> The not-so-well-hidden agenda was exactly that we _SHOULD_ not
> mimick PGP.  They do not have a requirement to encourage everybody
> to use the same thing because each message is encrypted/signed
> independently, i.e. they do not have to chain things like we do.

To put it less succinctly, PGP does not have incentive to encourage
everybody to converge to the same.  They can afford to say "You can
use whatever you among your circles agree to use and the rest of the
world won't care".  If two groups that have used different ones later
meet, both of them can switch to a common one from that point forward,
but their past exchanges won't affect the future.

You cannot say the same thing for Git.  Once you decide to merge two
histories from two camps, which may have originated from the same
codebase but then decided to use two different ones while they were
forked, you'd be forced to support all three forever.  We have a lot
stronger incentive to discourage fragmentation.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 20:05             ` ankostis
@ 2017-02-24 20:32               ` Junio C Hamano
  2017-02-25  0:31                 ` ankostis
  0 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2017-02-24 20:32 UTC (permalink / raw)
  To: ankostis
  Cc: Stefan Beller, David Lang, Ian Jackson, Joey Hess,
	git@vger.kernel.org

ankostis <ankostis@gmail.com> writes:

> Let's assume that git is retroffited to always support the "default"
> SHA-3, but support additionally more hash-funcs.
> If in the future SHA-3 also gets defeated, it would be highly unlikely
> that the same math would also break e.g. Blake.
> So certain high-profile repos might choose for extra security 2 or more hashes.

I think you are conflating two unrelated things.

 * How are these "2 or more hashes" actually used?  Are you going to
   add three "parent " line to a commit with just one parent, each
   line storing the different hashes?  How will such a commit object
   be named---does it have three names and do you plan to have three
   copies of .git/refs/heads/master somehow, each of which have
   SHA-1, SHA-3 and Blake, and let any one hash to identify the
   object?

   I suspect you are not going to do so; instead, you would use a
   very long string that is a concatenation of these three hashes as
   if it is an output from a single hash function that produces a
   long result.

   So I think the most natural way to do the "2 or more for extra
   security" is to allow us to use a very long hash.  It does not
   help to allow an object to be referred to with any of these 2 or
   more hashes at the same time.

 * If employing 2 or more hashes by combining into one may enhance
   the security, that is wonderful.  But we want to discourage
   people from inventing their own combinations left and right and
   end up fragmenting the world.  If a project that begins with
   SHA-1 only naming is forked to two (or more) and each fork uses
   different hashes, merging them back will become harder than
   necessary unless you support all these hashes forks used.

Having said all that, the way to figure out the hash used in the way
we spell the object name may not be the best place to discourage
people from using random hashes of their choice.  But I think we
want to avoid doing something that would actively encourage
fragmentation.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 18:58         ` Stefan Beller
  2017-02-24 19:20           ` Junio C Hamano
@ 2017-02-24 20:33           ` Philip Oakley
  1 sibling, 0 replies; 136+ messages in thread
From: Philip Oakley @ 2017-02-24 20:33 UTC (permalink / raw)
  To: Stefan Beller, Junio C Hamano; +Cc: David Lang, Ian Jackson, Joey Hess, git

From: "Stefan Beller" <sbeller@google.com>
> On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano <gitster@pobox.com> 
> wrote:
>
>> you are inviting people to start using
>>
>>     md5,54ddf8d47340e048166c45f439ce65fd
>>
>> as object names.
>
> which might even be okay for specific subsets of operations.
> (e.g. all local work including staging things, making local "fixup" 
> commits)
>
> The addressing scheme should not be too hardcoded, we should rather
> treat it similar to the cipher schemes in pgp. The additional complexity 
> that
> we have is the longevity of existence of things, though.
>

One potential nicety of using the md5 is that it is a known `toy problem` 
solution that could be used to explore how things might be made to work, 
without any expectation that the temporary code is in any way an 
experimental part of regular code. Maybe. It's good to have a toy problem to 
work on.

There are other issue to be considered as well, such as validating a 
transition of identical blobs and trees (at some point there will for some 
users be a forced update of hash of unchanged code), which probably requires 
two way traversal.

Philip 

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 16:43 SHA1 collisions found Joey Hess
                   ` (4 preceding siblings ...)
  2017-02-24 15:13 ` Ian Jackson
@ 2017-02-24 22:47 ` Jakub Narębski
  2017-02-24 22:53   ` Santiago Torres
  2017-02-24 23:06   ` Jeff King
  5 siblings, 2 replies; 136+ messages in thread
From: Jakub Narębski @ 2017-02-24 22:47 UTC (permalink / raw)
  To: Joey Hess, git

I have just read on ArsTechnica[1] that while Git repository could be
corrupted (though this would require attackers to spend great amount
of resources creating their own collision, while as said elsewhere
in this thread allegedly easy to detect), putting two proof-of-concept
different PDFs with same size and SHA-1 actually *breaks* Subversion.
Repository can become corrupt, and stop accepting new commits.  

From what I understand people tried this, and Git doesn't exhibit
such problem.  I wonder what assumptions SVN made that were broken...

The https://shattered.io/ page updated their Q&A section with this
information.

BTW. what's with that page use of "GIT" instead of "Git"??

[1]: https://arstechnica.com/security/2017/02/watershed-sha1-collision-just-broke-the-webkit-repository-others-may-follow/ 
     "Watershed SHA1 collision just broke the WebKit repository, others may follow"

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 22:47 ` Jakub Narębski
@ 2017-02-24 22:53   ` Santiago Torres
  2017-02-24 23:05     ` Jakub Narębski
  2017-02-24 23:06   ` Jeff King
  1 sibling, 1 reply; 136+ messages in thread
From: Santiago Torres @ 2017-02-24 22:53 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Joey Hess, git

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> I have just read on ArsTechnica[1] that while Git repository could be
> corrupted (though this would require attackers to spend great amount
> of resources creating their own collision, while as said elsewhere
> in this thread allegedly easy to detect), putting two proof-of-concept
> different PDFs with same size and SHA-1 actually *breaks* Subversion.
> Repository can become corrupt, and stop accepting new commits.  

From what I understood in the thread[1], it was the combination of svn +
git-svn together. I think Arstechnica may be a little bit
sensationalistic here.

Cheers!
-Santiago.

[1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 22:53   ` Santiago Torres
@ 2017-02-24 23:05     ` Jakub Narębski
  2017-02-24 23:24       ` Øyvind A. Holm
  0 siblings, 1 reply; 136+ messages in thread
From: Jakub Narębski @ 2017-02-24 23:05 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Joey Hess, git

W dniu 24.02.2017 o 23:53, Santiago Torres pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
>>
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
> 
> From what I understood in the thread[1], it was the combination of svn +
> git-svn together. I think Arstechnica may be a little bit
> sensationalistic here.
 
> [1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27

Thanks for the link.  It looks like the problem was with svn itself
(couldn't checkout, couldn't sync), but repository is recovered now,
though not protected against the problem occurring again.

Well, anyone with Subversion installed (so not me) can check it
for himself/herself... though better do this with separate svnroot.


Note that the breakage was an accident, trying to add test case
for SHA-1 collision in WebKit cache.
 
Best regards,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 22:47 ` Jakub Narębski
  2017-02-24 22:53   ` Santiago Torres
@ 2017-02-24 23:06   ` Jeff King
  2017-02-24 23:35     ` Jakub Narębski
                       ` (2 more replies)
  1 sibling, 3 replies; 136+ messages in thread
From: Jeff King @ 2017-02-24 23:06 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Joey Hess, git

On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:

> I have just read on ArsTechnica[1] that while Git repository could be
> corrupted (though this would require attackers to spend great amount
> of resources creating their own collision, while as said elsewhere
> in this thread allegedly easy to detect), putting two proof-of-concept
> different PDFs with same size and SHA-1 actually *breaks* Subversion.
> Repository can become corrupt, and stop accepting new commits.  
> 
> From what I understand people tried this, and Git doesn't exhibit
> such problem.  I wonder what assumptions SVN made that were broken...

To be clear, nobody has generated a sha1 collision in Git yet, and you
cannot blindly use the shattered PDFs to do so. Git's notion of the
SHA-1 of an object include the header, so somebody would have to do a
shattered-level collision search for something that starts with the
correct "blob 1234\0" header.

So we don't actually know how Git would behave in the face of a SHA-1
collision. It would be pretty easy to simulate it with something like:

---
diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 22b125cf8..1be5b5ba3 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
 		memcpy(ctx->W, data, len);
 }
 
+/* sha1 of blobs containing "foo\n" and "bar\n" */
+static const unsigned char foo_sha1[] = {
+	0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
+	0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
+};
+static const unsigned char bar_sha1[] = {
+	0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
+	0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
+};
+
 void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
 {
 	static const unsigned char pad[64] = { 0x80 };
@@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
 	/* Output hash */
 	for (i = 0; i < 5; i++)
 		put_be32(hashout + i * 4, ctx->H[i]);
+
+	/* pretend "foo" and "bar" collide */
+	if (!memcmp(hashout, bar_sha1, 20))
+		memcpy(hashout, foo_sha1, 20);
 }

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:05     ` Jakub Narębski
@ 2017-02-24 23:24       ` Øyvind A. Holm
  0 siblings, 0 replies; 136+ messages in thread
From: Øyvind A. Holm @ 2017-02-24 23:24 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Santiago Torres, Joey Hess, git

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

On 2017-02-25 00:05:34, Jakub Narębski wrote:
> W dniu 24.02.2017 o 23:53, Santiago Torres pisze:
> > On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> > > I have just read on ArsTechnica[1] that while Git repository could 
> > > be corrupted (though this would require attackers to spend great 
> > > amount of resources creating their own collision, while as said 
> > > elsewhere in this thread allegedly easy to detect), putting two 
> > > proof-of-concept different PDFs with same size and SHA-1 actually 
> > > *breaks* Subversion. Repository can become corrupt, and stop 
> > > accepting new commits.
> >
> > From what I understood in the thread[1], it was the combination of 
> > svn + git-svn together. I think Arstechnica may be a little bit 
> > sensationalistic here.
>
> > [1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27
>
> Thanks for the link.  It looks like the problem was with svn itself 
> (couldn't checkout, couldn't sync), but repository is recovered now, 
> though not protected against the problem occurring again.
>
> Well, anyone with Subversion installed (so not me) can check it for 
> himself/herself... though better do this with separate svnroot.

I tested this yesterday by adding the two PDF files to a Subversion 
repository, and found that it wasn't able to clone ("checkout" in svn 
speak) the repository after the two files had been committed. I posted 
the results to the svn-dev mailing list, the thread is at 
<https://svn.haxx.se/dev/archive-2017-02/0142.shtml>.

It seems as it only breaks the working copy because the pristine copies 
are identified with a SHA1 sum, but the FSFS repository backend seems to 
cope with it.

Regards,
Øyvind

+-| Øyvind A. Holm <sunny@sunbase.org> - N 60.37604° E 5.33339° |-+
| OpenPGP: 0xFB0CBEE894A506E5 - http://www.sunbase.org/pubkey.asc |
| Fingerprint: A006 05D6 E676 B319 55E2  E77E FB0C BEE8 94A5 06E5 |
+------------| 41517b2c-fae7-11e6-9521-db5caa6d21d3 |-------------+

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:06   ` Jeff King
@ 2017-02-24 23:35     ` Jakub Narębski
  2017-02-25 22:35     ` Lars Schneider
  2017-02-26 18:57     ` Thomas Braun
  2 siblings, 0 replies; 136+ messages in thread
From: Jakub Narębski @ 2017-02-24 23:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Joey Hess, git

W dniu 25.02.2017 o 00:06, Jeff King pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> 
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
>>
>> From what I understand people tried this, and Git doesn't exhibit
>> such problem.  I wonder what assumptions SVN made that were broken...
> 
> To be clear, nobody has generated a sha1 collision in Git yet, and you
> cannot blindly use the shattered PDFs to do so. Git's notion of the
> SHA-1 of an object include the header, so somebody would have to do a
> shattered-level collision search for something that starts with the
> correct "blob 1234\0" header.

What I meant by "Git doesn't exhibit such problem" (but was not clear
enough) is that Git doesn't break by just adding SHAttered.io PDFs
(which somebody had checked), but need customized attack.

> 
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:

You are right that it would be good to know if such Git-geared customized
SHA-1 attack would break Git, or would it simply corrupt it (visibly
or not).

> 
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
>  		memcpy(ctx->W, data, len);
>  }
>  
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> +	0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> +	0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> +	0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> +	0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
>  	static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  	/* Output hash */
>  	for (i = 0; i < 5; i++)
>  		put_be32(hashout + i * 4, ctx->H[i]);
> +
> +	/* pretend "foo" and "bar" collide */
> +	if (!memcmp(hashout, bar_sha1, 20))
> +		memcpy(hashout, foo_sha1, 20);
>  }
> 


^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:32   ` Junio C Hamano
  2017-02-24 17:45     ` David Lang
@ 2017-02-24 23:39     ` Jeff King
  2017-02-25  0:39       ` Linus Torvalds
  2017-02-25  1:00       ` David Lang
  2017-02-24 23:43     ` Ian Jackson
  2017-02-25 18:50     ` brian m. carlson
  3 siblings, 2 replies; 136+ messages in thread
From: Jeff King @ 2017-02-24 23:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ian Jackson, Joey Hess, git

On Fri, Feb 24, 2017 at 09:32:13AM -0800, Junio C Hamano wrote:

> >  * Therefore the transition needs to be done by giving every object
> >    two names (old and new hash function).  Objects may refer to each
> >    other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.  Existing code may not be able
> to read anything new, but you can make the new code understand
> object names in both formats, and for a smooth transition, I think
> the new code needs to.
> 
> For example, a new commit that records a merge of an old and a new
> commit whose resulting tree happens to be the same as the tree of
> the old commit may begin like so:
> 
>     tree 21b97d4c4f968d1335f16292f954dfdbb91353f0
>     parent 20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
>     parent 22af6fef9b6538c9e87e147a920be9509acf1ddd
> 
> naming the only object whose name was done with new hash with the
> new longer hash, while recording the names of the other existing
> objects with SHA-1.  We would need to extend the object format for
> tag (which would be trivial as the object reference is textual and
> similar to a commit) and tree (much harder), of course.

One thing I worry about in a mixed-hash setting is how often the two
will be mixed. That will lead to interoperability complications, but I
also think it creates security hazards (if I can convince you somehow to
refer to my evil colliding file by its sha1, for example, then I can
subvert the strength of the new hash).

So I'd much rather see strong rules like:

  1. Once a repo has flag-day switched over to the new hash format[1],
     new references are _always_ done with the new hash. Even ones that
     point to pre-flag-day objects!

     So you get a "commit-v2" object instead of a "commit", and it has a
     distinct hash identity from its "commit" counterpart. You can point
     to a classic "commit", but you do so by its new-hash.

     The flag-day switch would probably be a repo config flag based on
     repositoryformatversion (so old versions would just punt if they
     see it). Let's call this flag "newhash" for lack of a better term.

  2. Repos that have new-hash set will consider the new hash
     format as primary, and always use it when writing and referring to
     new objects (e.g., in refs). A (purely local) sha1->new mapping can
     be maintained for doing old-style object lookups, or for quick
     equivalence checks (this mapping might need to be bi-directional
     for some use cases; I haven't thought hard enough about it to say
     either way).

  3. For protocol interop, the rules would be something like[2]:

      a. If upload-pack is serving a newhash repo, it advertises
         so in the capabilities.

	 Recent clients know that the rest of the conversation will
	 involve the new hash format. If they're cloning, they set the
	 newhash flag in their local config.  If they're fetching, they
	 probably abort and say "please enable newhash" (because for an
	 existing repo, it probably needs to migrate refs, for example).

	 An old client would fail to send back the newhash capability,
	 and the server would abort the conversation at that point.

	 A new upload-pack serving a non-newhash repo behaves the same
	 as now (use sha1, happily interoperate with existing and new
	 clients).

      b. receive-pack is more or less the mirror image.

         A server for a newhash-flagged repo has a capability for "this
	 is a newhash repo" and advertises newhash refs. An existing
	 client might still try to push, but the server would reject it
	 unless it advertises "newhash" back to the server.

	 A newhash-enabled client on a non-newhash repo would abort more
	 gracefully ("please upgrade your local repo to newhash").

	 For a newhash-enabled server with a non-newhash repo, it would
	 probably not advertise anything (not even "I understand
	 newhash"). Because the process for converting to newhash is not
	 "just push some newhash objects", but an out-of-band flag-day
	 to convert it over.

That's just a sketch I came up with. There are probably holes. And it
definitely leaves a lot of _possible_ interoperability on the table in
favor of the flag-day approach. But I think the flag-day approach is a
lot easier to reason about. Both in the code, and in terms of the
security properties.

-Peff

[1] I was intentionally vague on "new hash format" here. Obviously there
    are various contenders like SHA-256. But I think there's also an
    open question of whether the new format should be a multi-hash
    format. That would ease further transitions. At the same time, we
    really _don't_ want people picking bespoke hashes for their
    repositories. It creates complications in the code, and it destroys
    a bunch of optimizations (like knowing when we are both talking
    about the same object based on the hash).

    So I am torn between "move to SHA-256 (or whatever)" and "move to a
    hash format that encodes the hash-type in the first byte, but refuse
    to allocate more than one hash for now".

[2] If we're having a flag-day event, this _might_ be time to consider
    some of the breaking protocol changes that have been under
    discussion.  I'm really hesitant to complicate this already-tricky
    issue by throwing in the kitchen sink. But if there's going to be a
    flag day where you need to upgrade Git to access certain repos, it
    might be nice if there's only one. I dunno.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:32   ` Junio C Hamano
  2017-02-24 17:45     ` David Lang
  2017-02-24 23:39     ` Jeff King
@ 2017-02-24 23:43     ` Ian Jackson
  2017-02-25  0:06       ` Ian Jackson
  2017-02-25 18:50     ` brian m. carlson
  3 siblings, 1 reply; 136+ messages in thread
From: Ian Jackson @ 2017-02-24 23:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joey Hess, git

Junio C Hamano writes ("Re: SHA1 collisions found"):
> Ian Jackson <ijackson@chiark.greenend.org.uk> writes:
> >  * Therefore the transition needs to be done by giving every object
> >    two names (old and new hash function).  Objects may refer to each
> >    other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.

Indeed.  And my latest thoughts involve instead having two parallel
systems of old and new objects.

> *1* In the above toy example, length being 40 vs 64 is used as a
>     sign between SHA-1 and the new hash, and careful readers may
>     wonder if we should use sha-3,20769079d22... or something like
>     that that more explicity identifies what hash is used, so that
>     we can pick a hash whose length is 64 when we transition again.

I have an idea for this.  I think we should prefix new hashes with a
single uppercase letter, probably H.

Uppercase because: case-only-distinguished ref names are already
discouraged because they do not work properly on case-insensitive
filesystems; convention is that ref names are lowercase; so an
uppercase letter probably won't appear at the start of a ref name
component even though almost all existing software will treat it as
legal.  So the result is that the new object names are unlikely to
collide with ref names.

(There is of course no need to store the H as a literal in filenames,
so the case-insensitive filesystem problem does not apply to ref
names.)

We should definitely not introduce new punctuation into object names.
That will cause a great deal of grief for existing software which has
to handle git object names and may thy to store them in
representations which assume that they match \w+.

The idea of using the length is a neat trick, but it cannot support
the dcurrent object name abbreviation approach unworkable.

Ian.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:43     ` Ian Jackson
@ 2017-02-25  0:06       ` Ian Jackson
  0 siblings, 0 replies; 136+ messages in thread
From: Ian Jackson @ 2017-02-25  0:06 UTC (permalink / raw)
  To: Junio C Hamano, Joey Hess, git

Ian Jackson writes ("Re: SHA1 collisions found"):
> The idea of using the length is a neat trick, but it cannot support
> the dcurrent object name abbreviation approach unworkable.

Sorry, it's late here and my grammar seems to have disintegrated !

Ian.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 20:32               ` Junio C Hamano
@ 2017-02-25  0:31                 ` ankostis
  2017-02-26  0:16                   ` Jason Cooper
  0 siblings, 1 reply; 136+ messages in thread
From: ankostis @ 2017-02-25  0:31 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Stefan Beller, David Lang, Ian Jackson, Joey Hess, git

On 24 February 2017 at 21:32, Junio C Hamano <gitster@pobox.com> wrote:
> ankostis <ankostis@gmail.com> writes:
>
>> Let's assume that git is retroffited to always support the "default"
>> SHA-3, but support additionally more hash-funcs.
>> If in the future SHA-3 also gets defeated, it would be highly unlikely
>> that the same math would also break e.g. Blake.
>> So certain high-profile repos might choose for extra security 2 or more hashes.
>
> I think you are conflating two unrelated things.

I believe the two distinct things you refer to below are these:

  a. storing objects in filesystem and accessing them
     by name (e.g. from cmdline), and

  b. cross-referencing inside the objects (trees, tags, notes),

correct?

If not, then please ignore my answers, below.


>  * How are these "2 or more hashes" actually used?  Are you going to
>    add three "parent " line to a commit with just one parent, each
>    line storing the different hashes?

Yes, in all places where references are involved (tags, notes).
Based on what what the git-hackers have written so far, this might be doable.

To ensure integrity in the case of crypto-failures, all objects must
cross-reference each other with multiple hashes.
Of course this extra security would stop as soon as you reach "old"
history (unless you re-write it).


>    How will such a commit object
>    be named---does it have three names and do you plan to have three
>    copies of .git/refs/heads/master somehow, each of which have
>    SHA-1, SHA-3 and Blake, and let any one hash to identify the
>    object?

Yes, based on Jason Cooper's idea, above, objects would be stored
under all names in the filesystem using hard links (although this
might not work nice on Windows).


>    I suspect you are not going to do so; instead, you would use a
>    very long string that is a concatenation of these three hashes as
>    if it is an output from a single hash function that produces a
>    long result.
>
>    So I think the most natural way to do the "2 or more for extra
>    security" is to allow us to use a very long hash.  It does not
>    help to allow an object to be referred to with any of these 2 or
>    more hashes at the same time.

If hard-linking all names is doable, then most restrictions above are
gone, correct?


>  * If employing 2 or more hashes by combining into one may enhance
>    the security, that is wonderful.  But we want to discourage
>    people from inventing their own combinations left and right and
>    end up fragmenting the world.  If a project that begins with
>    SHA-1 only naming is forked to two (or more) and each fork uses
>    different hashes, merging them back will become harder than
>    necessary unless you support all these hashes forks used.

Agree on discouraging people's inventions.

That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
All git >= 3.x.x must support at least this one (for naming and
cross-referencing between objects).


> Having said all that, the way to figure out the hash used in the way
> we spell the object name may not be the best place to discourage
> people from using random hashes of their choice.  But I think we
> want to avoid doing something that would actively encourage
> fragmentation.

I guess the "blessed SHA-3 will discourage people using the other
names., untill the next crypto-crack.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:39     ` Jeff King
@ 2017-02-25  0:39       ` Linus Torvalds
  2017-02-25  0:54         ` Linus Torvalds
                           ` (3 more replies)
  2017-02-25  1:00       ` David Lang
  1 sibling, 4 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-25  0:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, Git Mailing List

On Fri, Feb 24, 2017 at 3:39 PM, Jeff King <peff@peff.net> wrote:
>
> One thing I worry about in a mixed-hash setting is how often the two
> will be mixed.

Honestly, I think that a primary goal for a new hash implementation
absolutely needs to be to minimize mixing.

Not for security issues, but because of combinatorics. You want to
have a model that basically reads old data, but that very aggressively
approaches "new data only" in order to avoid the situation where you
have basically the exact same tree state, just _represented_
differently.

For example, what I would suggest the rules be is something like this:

 - introduce new tag2/commit2/tree2/blob2 object type tags that imply
that they were hashed using the new hash

 - an old type obviously can never contain a pointer to a new type (ie
you can't have a "tree" object that contains a tree2 object or a blob2
object.

 - but also make the rule that a *new* type can never contain a
pointer to an old type, with the *very* specific exception that a
commit2 can have a parent that is of type "commit".

That way everything "converges" towards the new format: the only way
you can stay on the old format is if you only have old-format objects,
and once you have a new-format object all your objects are going to be
new format - except for the history.

Obviously, if somebody stays in old format, you might end up still
getting some object duplication when you continue to merge from him,
but that tree can never merge back without converting to new-format,
so it will be a temporary situation.

So you will end up with duplicate objects, and that's not good (think
of what it does to all our full-tree "diff" optimizations, for example
- you no longer get the "these sub-trees are identical" across a
format change), but realistically you'll have a very limited time of
that kind of duplication.

I'd furthermore suggest that from a UI standpoint, we'd

 - convert to 64-character hex numbers (32-byte hashes)

 - (as mentioned earlier) default to a 40-character abbreviation

 - make the old 40-character SHA1's just show up within the same
address space (so they'd also be encoded as 32-byte hashes, just with
the last 12 bytes zero).

 - you'd see in the "object->type" whether it's a new or old-style hash.

I suspect it shouldn't be too painful to do it that way.

                Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  0:39       ` Linus Torvalds
@ 2017-02-25  0:54         ` Linus Torvalds
  2017-02-25  1:16         ` Jeff King
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-25  0:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, Git Mailing List

On Fri, Feb 24, 2017 at 4:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  - you'd see in the "object->type" whether it's a new or old-style hash.

Actually, I take that back. I think it might be easier to keep
"object->type" as-is, and it would only show the current OBJ_xyz
fields. Then writing the SHA ends up deciding whether a OBJ_COMMIT
gets written as "commit" or "commit2".

With the reachability rules, you'd never have any ambiguity about which to use.

                Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:39     ` Jeff King
  2017-02-25  0:39       ` Linus Torvalds
@ 2017-02-25  1:00       ` David Lang
  2017-02-25  1:15         ` Stefan Beller
  2017-02-25  1:21         ` Jeff King
  1 sibling, 2 replies; 136+ messages in thread
From: David Lang @ 2017-02-25  1:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, git

On Fri, 24 Feb 2017, Jeff King wrote:

>
> So I'd much rather see strong rules like:
>
>  1. Once a repo has flag-day switched over to the new hash format[1],
>     new references are _always_ done with the new hash. Even ones that
>     point to pre-flag-day objects!

how do you define when a repo has "switched over" to the new format in a 
distributed environment?

so you have one person working on a project that switches their version of git 
to the new one that uses the new format.

But other people they interact with still use older versions of git

what happens when you have someone working on two different projects where one 
has switched and the other hasn't?

what if they are forks of each other? (LEDE and OpenWRT, or just linux-kernel 
and linux-kernel-stable)

>     So you get a "commit-v2" object instead of a "commit", and it has a
>     distinct hash identity from its "commit" counterpart. You can point
>     to a classic "commit", but you do so by its new-hash.
>
>     The flag-day switch would probably be a repo config flag based on
>     repositoryformatversion (so old versions would just punt if they
>     see it). Let's call this flag "newhash" for lack of a better term.

so how do you interact with someone who only expects the old commit instead of 
the commit-v2?

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:00       ` David Lang
@ 2017-02-25  1:15         ` Stefan Beller
  2017-02-25  1:21         ` Jeff King
  1 sibling, 0 replies; 136+ messages in thread
From: Stefan Beller @ 2017-02-25  1:15 UTC (permalink / raw)
  To: David Lang
  Cc: Jeff King, Junio C Hamano, Ian Jackson, Joey Hess,
	git@vger.kernel.org

On Fri, Feb 24, 2017 at 5:00 PM, David Lang <david@lang.hm> wrote:
> On Fri, 24 Feb 2017, Jeff King wrote:
>
>>
>> So I'd much rather see strong rules like:
>>
>>  1. Once a repo has flag-day switched over to the new hash format[1],
>>     new references are _always_ done with the new hash. Even ones that
>>     point to pre-flag-day objects!
>
>
> how do you define when a repo has "switched over" to the new format in a
> distributed environment?
>
> so you have one person working on a project that switches their version of
> git to the new one that uses the new format.
>
> But other people they interact with still use older versions of git
>
> what happens when you have someone working on two different projects where
> one has switched and the other hasn't?

you get infected by the "new version requirement"
as soon as you pull? (GPL is cancer, anyone? ;)

If you are using an old version of git that doesn't understand the new version,
you're screwed.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  0:39       ` Linus Torvalds
  2017-02-25  0:54         ` Linus Torvalds
@ 2017-02-25  1:16         ` Jeff King
  2017-02-26 18:55           ` Junio C Hamano
  2017-02-25  6:10         ` Junio C Hamano
  2017-03-02 19:55         ` Linus Torvalds
  3 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-25  1:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, Git Mailing List

On Fri, Feb 24, 2017 at 04:39:45PM -0800, Linus Torvalds wrote:

> For example, what I would suggest the rules be is something like this:
> 
>  - introduce new tag2/commit2/tree2/blob2 object type tags that imply
> that they were hashed using the new hash
> 
>  - an old type obviously can never contain a pointer to a new type (ie
> you can't have a "tree" object that contains a tree2 object or a blob2
> object.
> 
>  - but also make the rule that a *new* type can never contain a
> pointer to an old type, with the *very* specific exception that a
> commit2 can have a parent that is of type "commit".

Yeah, this is exactly what I had in mind. That way everybody in
"newhash" mode has no decisions to make. They follow the same rules and
it's as if sha1 never existed, except when you follow links in
historical objects.

> [in reply...]
> Actually, I take that back. I think it might be easier to keep
> "object->type" as-is, and it would only show the current OBJ_xyz
> fields. Then writing the SHA ends up deciding whether a OBJ_COMMIT
> gets written as "commit" or "commit2".

Yeah, I think there are some data structures with limited bits for the
"type" fields (e.g., the pack format). So sticking with OBJ_COMMIT might
be nice. For commits and tags, it would be nice to have an "I'm v2"
header at the start so there's no confusion about how they are meant to
be interpreted.

Trees are more difficult, as they don't have any such field. But a valid
tree does need to start with a mode, so sticking some non-numeric flag
at the front of the object would work (it breaks backwards
compatibility, but that's kind of the point).

I dunno. Maybe we do not need those markers at all, and could get by
purely on object-length, or annotating the headers in some way (like
"parent sha256:1234abcd").

It might just be nice if we could very easily identify objects as one
type or the other without having to parse them in detail.

> So you will end up with duplicate objects, and that's not good (think
> of what it does to all our full-tree "diff" optimizations, for example
> - you no longer get the "these sub-trees are identical" across a
> format change), but realistically you'll have a very limited time of
> that kind of duplication.

Yeah, cross-flag-day diffs will be more expensive. I think that's
something we have to live with. I was thinking originally that the
sha1->newhash mapping might solve that, but it only works at the blob
level. I.e., you can compare a sha1 and a newhash like:

  if (!hashcmp(sha1_to_newhash(a), b))

without having to look at the contents. But it doesn't work recursively,
because the tree-pointing-to-newhash will have different content.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:00       ` David Lang
  2017-02-25  1:15         ` Stefan Beller
@ 2017-02-25  1:21         ` Jeff King
  2017-02-25  1:39           ` David Lang
  2017-02-25  2:26           ` Jacob Keller
  1 sibling, 2 replies; 136+ messages in thread
From: Jeff King @ 2017-02-25  1:21 UTC (permalink / raw)
  To: David Lang; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, git

On Fri, Feb 24, 2017 at 05:00:55PM -0800, David Lang wrote:

> On Fri, 24 Feb 2017, Jeff King wrote:
> 
> > 
> > So I'd much rather see strong rules like:
> > 
> >  1. Once a repo has flag-day switched over to the new hash format[1],
> >     new references are _always_ done with the new hash. Even ones that
> >     point to pre-flag-day objects!
> 
> how do you define when a repo has "switched over" to the new format in a
> distributed environment?

You don't. It's a decision for each local repo, but the rules push
everybody towards upgrading (because you forbid them pulling from or
pushing to people who have upgraded).

So in practice, some centralized distribution point switches, and then
it floods out from there.

> so you have one person working on a project that switches their version of
> git to the new one that uses the new format.

That shouldn't happen when they switch. It should happen when they
decide to move their local clone to the new format. So let's assume they
upgrade _and_ decide to switch.

> But other people they interact with still use older versions of git

Those people get forced to upgrade if they want to continue interacting.

> what happens when you have someone working on two different projects where
> one has switched and the other hasn't?

See above. You only flip the flag on for one of the projects.

> what if they are forks of each other? (LEDE and OpenWRT, or just
> linux-kernel and linux-kernel-stable)

Once one flips, the other one needs to flip to, or can't interact with
them. I know that's harsh, and is likely to create headaches. But in the
long run, I think once everything has converged the resulting system is
less insane.

For that reason I _wouldn't_ recommend projects like the kernel flip the
flag immediately. Ideally we write the code and the new versions
permeate the community. Then somebody (per-project) decides that it's
time for the community to start switching.

> >     The flag-day switch would probably be a repo config flag based on
> >     repositoryformatversion (so old versions would just punt if they
> >     see it). Let's call this flag "newhash" for lack of a better term.
> 
> so how do you interact with someone who only expects the old commit instead
> of the commit-v2?

You ask them to upgrade.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:21         ` Jeff King
@ 2017-02-25  1:39           ` David Lang
  2017-02-25  1:47             ` Jeff King
  2017-02-25  2:28             ` Jacob Keller
  2017-02-25  2:26           ` Jacob Keller
  1 sibling, 2 replies; 136+ messages in thread
From: David Lang @ 2017-02-25  1:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, git

On Fri, 24 Feb 2017, Jeff King wrote:

>> what if they are forks of each other? (LEDE and OpenWRT, or just
>> linux-kernel and linux-kernel-stable)
>
> Once one flips, the other one needs to flip to, or can't interact with
> them. I know that's harsh, and is likely to create headaches. But in the
> long run, I think once everything has converged the resulting system is
> less insane.
>
> For that reason I _wouldn't_ recommend projects like the kernel flip the
> flag immediately. Ideally we write the code and the new versions
> permeate the community. Then somebody (per-project) decides that it's
> time for the community to start switching.

can you 'un-flip' the flag? or if you have someone who is a developer flip their 
repo (because they heard that sha1 is unsafe, and they want to be safe), they 
can't contribute to the kernel. We don't want to have them loose all their work, 
so how can they convert their local repo back to somthing that's compatible?

how would submodules work if one module flips and another (or the parent) 
doesn't?

OpenWRT/LEDE have their core repo, and they pull from many other (unrelated) 
projects into that repo (and then have 'feeds', which is sort-of-like-submodules 
to pull in other software that's maintained completely independently)

Microsoft has made lots of money with people being forced to upgrade Word 
because one person got a new version and everyone else needed to upgrade to be 
compatible. There's a LOT of pain during that process. Is that really the best 
way to go?

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:39           ` David Lang
@ 2017-02-25  1:47             ` Jeff King
  2017-02-25  1:56               ` David Lang
  2017-02-25  2:28             ` Jacob Keller
  1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-25  1:47 UTC (permalink / raw)
  To: David Lang; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, git

On Fri, Feb 24, 2017 at 05:39:43PM -0800, David Lang wrote:

> On Fri, 24 Feb 2017, Jeff King wrote:
> 
> > > what if they are forks of each other? (LEDE and OpenWRT, or just
> > > linux-kernel and linux-kernel-stable)
> > 
> > Once one flips, the other one needs to flip to, or can't interact with
> > them. I know that's harsh, and is likely to create headaches. But in the
> > long run, I think once everything has converged the resulting system is
> > less insane.
> > 
> > For that reason I _wouldn't_ recommend projects like the kernel flip the
> > flag immediately. Ideally we write the code and the new versions
> > permeate the community. Then somebody (per-project) decides that it's
> > time for the community to start switching.
> 
> can you 'un-flip' the flag? or if you have someone who is a developer flip
> their repo (because they heard that sha1 is unsafe, and they want to be
> safe), they can't contribute to the kernel. We don't want to have them loose
> all their work, so how can they convert their local repo back to somthing
> that's compatible?

I don't think it would be too hard to write an un-flipper (it's
basically just rewriting the newhash bit of history using sha1, and
converting your refs back to point at the sha1s).

> how would submodules work if one module flips and another (or the parent)
> doesn't?

That's a good question. It's possible that another exception should be
carved out for referring to a gitlink via sha1 (we _could_ say "no,
point to a newhash version of the submodule", but I think that creates a
lot of hardship for not much gain).

> OpenWRT/LEDE have their core repo, and they pull from many other (unrelated)
> projects into that repo (and then have 'feeds', which is
> sort-of-like-submodules to pull in other software that's maintained
> completely independently)

I think with submodules this should probably still work.  If they are
pulling in with a subtree-ish strategy, then they'd convert the incoming
trees to the newhash format as part of that.

> Microsoft has made lots of money with people being forced to upgrade Word
> because one person got a new version and everyone else needed to upgrade to
> be compatible. There's a LOT of pain during that process. Is that really the
> best way to go?

I think there's going to be a lot of pain regardless. Any attempt to
mitigate that pain and work seamlessly across old and new versions of
git is going cause _ongoing_ pain as people quietly rewrite the same
content back and forth with different hashes. The viral-convergence
strategy is painful once (when you're forced to upgrade), but after that
just works.

If you want to work on a dual-hash strategy, be my guest. I can't
promise I'll be able to find horrific corner cases in it, but I
certainly can't even try to do so until there is a concrete proposal. :)

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:47             ` Jeff King
@ 2017-02-25  1:56               ` David Lang
  0 siblings, 0 replies; 136+ messages in thread
From: David Lang @ 2017-02-25  1:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, git

On Fri, 24 Feb 2017, Jeff King wrote:

>> OpenWRT/LEDE have their core repo, and they pull from many other (unrelated)
>> projects into that repo (and then have 'feeds', which is
>> sort-of-like-submodules to pull in other software that's maintained
>> completely independently)
>
> I think with submodules this should probably still work.  If they are
> pulling in with a subtree-ish strategy, then they'd convert the incoming
> trees to the newhash format as part of that.

as I understand things, they have two categories of things

1. Feeds, which are completely independent, separate maintainers

2. core, which gets pulled into one repo, I don't know if they use submodules in 
the process. I know that what downstream users see is a single repo.

I understand and agree with the idea of trying to converge rapidly. I'm just 
looking at cases where this may be hard (or where there may be holdouts for 
whatever reason)

David Lang

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:21         ` Jeff King
  2017-02-25  1:39           ` David Lang
@ 2017-02-25  2:26           ` Jacob Keller
  2017-02-25  5:39             ` grarpamp
  1 sibling, 1 reply; 136+ messages in thread
From: Jacob Keller @ 2017-02-25  2:26 UTC (permalink / raw)
  To: Jeff King
  Cc: David Lang, Junio C Hamano, Ian Jackson, Joey Hess,
	Git mailing list

On Fri, Feb 24, 2017 at 5:21 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Feb 24, 2017 at 05:00:55PM -0800, David Lang wrote:
>
>> On Fri, 24 Feb 2017, Jeff King wrote:
>>
>> >
>> > So I'd much rather see strong rules like:
>> >
>> >  1. Once a repo has flag-day switched over to the new hash format[1],
>> >     new references are _always_ done with the new hash. Even ones that
>> >     point to pre-flag-day objects!
>>
>> how do you define when a repo has "switched over" to the new format in a
>> distributed environment?
>
> You don't. It's a decision for each local repo, but the rules push
> everybody towards upgrading (because you forbid them pulling from or
> pushing to people who have upgraded).
>
> So in practice, some centralized distribution point switches, and then
> it floods out from there.

This seems like the most reasonable strategy so far. I think that
trying to allow long term co-existence is a huge pain that discourages
switching, when we actually want to encourage everyone to switch
someone has switched.

I don't think it's sane to try and allow simultaneous use of both
hashes, since that creates a lot of headaches and discourages
transition somewhat.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:39           ` David Lang
  2017-02-25  1:47             ` Jeff King
@ 2017-02-25  2:28             ` Jacob Keller
  1 sibling, 0 replies; 136+ messages in thread
From: Jacob Keller @ 2017-02-25  2:28 UTC (permalink / raw)
  To: David Lang
  Cc: Jeff King, Junio C Hamano, Ian Jackson, Joey Hess,
	Git mailing list

On Fri, Feb 24, 2017 at 5:39 PM, David Lang <david@lang.hm> wrote:
> On Fri, 24 Feb 2017, Jeff King wrote:
>
>>> what if they are forks of each other? (LEDE and OpenWRT, or just
>>> linux-kernel and linux-kernel-stable)
>>
>>
>> Once one flips, the other one needs to flip to, or can't interact with
>> them. I know that's harsh, and is likely to create headaches. But in the
>> long run, I think once everything has converged the resulting system is
>> less insane.
>>
>> For that reason I _wouldn't_ recommend projects like the kernel flip the
>> flag immediately. Ideally we write the code and the new versions
>> permeate the community. Then somebody (per-project) decides that it's
>> time for the community to start switching.
>
>
> can you 'un-flip' the flag? or if you have someone who is a developer flip
> their repo (because they heard that sha1 is unsafe, and they want to be
> safe), they can't contribute to the kernel. We don't want to have them loose
> all their work, so how can they convert their local repo back to somthing
> that's compatible?

I'd think one of the first things we want is a way to flip *and*
unflip by re-writing history ala git-filter-branch style. (So if you
wanted, you could also flip all your old history).

One unrelated thought I had. When an old client sees the new stuff, it
will probably fail in a lot of weird ways. I wonder what we can do so
that if we in the future have to switch to an even newer hash, how can
we make it so that the old versions give a more clean error
experience? Ideally so that it lessens the pain of transition somewhat
in the future if/when it has to happen again?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  2:26           ` Jacob Keller
@ 2017-02-25  5:39             ` grarpamp
  0 siblings, 0 replies; 136+ messages in thread
From: grarpamp @ 2017-02-25  5:39 UTC (permalink / raw)
  To: git

Repos should address keeping / 'fixing' broken sha-1 as needed.
They also really need to create new native modes so users can
initialize and use repos with (sha-3 / sha-256 / whatever) going forward.
Backward compatibility with sha-1 or 'fixed sha-1' will be fine. Clients
can 'taste' and 'test' repos for which hash mode to use, or add it to
their configs. Make things flexible, modular, configurable, updateable.
What little point is there in 'fixing / caveating' their use of broken sha-1,
without also doing strong (sha-3 / optionals) in the first place, defaulting
new init's to whichever strong hash looks good, and letting natural
migration to that happen on its own through the default process.
Introducing new hash modes also gives good oppurtunity to incorporate
other generally 'incompatabile with the old' changes to benefit the future.
One might argue against mixed mode, after all, export and import,
as with any other repo migration, is generally possible.  And mixed
mode tends to prolong the actual endeavour to move to something
better in the init itself. Native and new makes you update to follow.
A lot of question / wrong ramble here, but the point should be
consistant... move, natively, even if only for sake of death of old
broken hashes. And attacks only get worse. Thought food is all.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  0:39       ` Linus Torvalds
  2017-02-25  0:54         ` Linus Torvalds
  2017-02-25  1:16         ` Jeff King
@ 2017-02-25  6:10         ` Junio C Hamano
  2017-02-26  1:13           ` Jason Cooper
  2017-03-02 19:55         ` Linus Torvalds
  3 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2017-02-25  6:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Ian Jackson, Joey Hess, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> For example, what I would suggest the rules be is something like this:
>
>  - introduce new tag2/commit2/tree2/blob2 object type tags that imply
> that they were hashed using the new hash
>
>  - an old type obviously can never contain a pointer to a new type (ie
> you can't have a "tree" object that contains a tree2 object or a blob2
> object.
>
>  - but also make the rule that a *new* type can never contain a
> pointer to an old type, with the *very* specific exception that a
> commit2 can have a parent that is of type "commit".

OK, I think that is what Peff was suggesting in his message, and I
do not have problem with such a transition plan.  Or the *very*
specific exception could be that a reference to "commit" can use old
name (which would allow binding a submodule before transition to a
new project).

We probably do not need "blob2" object as they do not embed any
pointer to another thing.  A loose blob with old name can be made
available on the filesystem also under new name without much "heavy"
transition, and an in-pack blob can be pointed at with _two_ entries
in the updated pack index file under old and new names, both for the
base (just deflated) representation and also ofs-delta.  A ref-delta
based on another blob with old name may need a bit of special
handling, but the deltification would not be visible at the "struct object"
layer, so probably not such a big deal.

We may also be able to get away without "commit2" and "tag2" as
their pointers can be widened and parse_{commit,tag}_object() should
be able to deal with objects with new names transparently.  "tree2"
may be a bit tricky, though, but offhand it seems to me that nothing
is insurmountable.

> That way everything "converges" towards the new format: the only way
> you can stay on the old format is if you only have old-format objects,
> and once you have a new-format object all your objects are going to be
> new format - except for the history.

Yes.

> So you will end up with duplicate objects, and that's not good (think
> of what it does to all our full-tree "diff" optimizations, for example
> - you no longer get the "these sub-trees are identical" across a
> format change), but realistically you'll have a very limited time of
> that kind of duplication.
>
> I'd furthermore suggest that from a UI standpoint, we'd
>
>  - convert to 64-character hex numbers (32-byte hashes)
>
>  - (as mentioned earlier) default to a 40-character abbreviation
>
>  - make the old 40-character SHA1's just show up within the same
> address space (so they'd also be encoded as 32-byte hashes, just with
> the last 12 bytes zero).

Yes to all of the above.

>  - you'd see in the "object->type" whether it's a new or old-style hash.

I am not sure if this is needed.  We may need to abstract tree_entry walker
a little bit as a preparatory step, but I suspect that the hash (and
more importantly the internal format) can be kept as an internal
knowledge to the object layer (i.e. {commit,tree,tag}.c).

So,... thanks for straightening me out.  I was thinking we would
need mixed mode support for smoother transition, but it now seems to
me that the approach to stratify the history into old and new is
workable.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:32   ` Junio C Hamano
                       ` (2 preceding siblings ...)
  2017-02-24 23:43     ` Ian Jackson
@ 2017-02-25 18:50     ` brian m. carlson
  2017-02-25 19:26       ` Jeff King
  3 siblings, 1 reply; 136+ messages in thread
From: brian m. carlson @ 2017-02-25 18:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ian Jackson, Joey Hess, git

[-- Attachment #1: Type: text/plain, Size: 1637 bytes --]

On Fri, Feb 24, 2017 at 09:32:13AM -0800, Junio C Hamano wrote:
> Ian Jackson <ijackson@chiark.greenend.org.uk> writes:
> 
> > I have been thinking about how to do a transition from SHA1 to another
> > hash function.
> 
> Good.  I think many of us have also been, too, not necessarily just
> in the past few days in response to shattered, but over the last 10
> years, yet without coming to a consensus design ;-)
> 
> > I have concluded that:
> >
> >  * We can should avoid expecting everyone to rewrite all their
> >    history.
> 
> Yes.

There are security implications for old objects if we mix hashes, but I
suppose people who want better security will just rewrite history
anyway.

> As long as the reader can tell from the format of object names
> stored in the "new object format" object from what era is being
> referred to in some way [*1*], we can name new objects with only new
> hash, I would think.  "new refers only to new" that stratifies
> objects into older and newer may make things simpler, but I am not
> convinced yet that it would give our users a smooth enough
> transition path (but I am open to be educated and pursuaded the
> other way).

I would simply use multihash[0] for this purpose.  New-style objects
serialize data in multihash format, so it's immediately obvious what
hash we're referring to.  That makes future transitions less
problematic.

[0] https://github.com/multiformats/multihash
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24  9:42 ` Duy Nguyen
@ 2017-02-25 19:04   ` brian m. carlson
  2017-02-27 13:29     ` René Scharfe
  0 siblings, 1 reply; 136+ messages in thread
From: brian m. carlson @ 2017-02-25 19:04 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Joey Hess, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

On Fri, Feb 24, 2017 at 04:42:38PM +0700, Duy Nguyen wrote:
> On Thu, Feb 23, 2017 at 11:43 PM, Joey Hess <id@joeyh.name> wrote:
> > IIRC someone has been working on parameterizing git's SHA1 assumptions
> > so a repository could eventually use a more secure hash. How far has
> > that gotten? There are still many "40" constants in git.git HEAD.
> 
> Michael asked Brian (that "someone") the other day and he replied [1]
> 
> >> I'm curious; what fraction of the overall convert-to-object_id campaign
> >> do you estimate is done so far? Are you getting close to the promised
> >> land yet?
> >
> > So I think that the current scope left is best estimated by the
> > following command:
> >
> >   git grep -P 'unsigned char\s+(\*|.*20)' | grep -v '^Documentation'
> >
> > So there are approximately 1200 call sites left, which is quite a bit of
> > work.  I estimate between the work I've done and other people's
> > refactoring work (such as the refs backend refactor), we're about 40%
> > done.

As a note, I've been working on this pretty much nonstop since the
collision announcement was made.  After another 27 commits, I've got it
down from 1244 to 1119.

I plan to send another series out sometime after the existing series has
hit next.  People who are interested can follow the object-id-part*
branches at https://github.com/bk2204/git.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25 18:50     ` brian m. carlson
@ 2017-02-25 19:26       ` Jeff King
  2017-02-25 22:09         ` Mike Hommey
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-25 19:26 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, Ian Jackson, Joey Hess, git

On Sat, Feb 25, 2017 at 06:50:50PM +0000, brian m. carlson wrote:

> > As long as the reader can tell from the format of object names
> > stored in the "new object format" object from what era is being
> > referred to in some way [*1*], we can name new objects with only new
> > hash, I would think.  "new refers only to new" that stratifies
> > objects into older and newer may make things simpler, but I am not
> > convinced yet that it would give our users a smooth enough
> > transition path (but I am open to be educated and pursuaded the
> > other way).
> 
> I would simply use multihash[0] for this purpose.  New-style objects
> serialize data in multihash format, so it's immediately obvious what
> hash we're referring to.  That makes future transitions less
> problematic.
> 
> [0] https://github.com/multiformats/multihash

I looked at that earlier, because I think it's a reasonable idea for
future-proofing. The first byte is a "varint", but I couldn't find where
they defined that format.

The closest I could find is:

  https://github.com/multiformats/unsigned-varint

whose README says:

  This unsigned varint (VARiable INTeger) format is for the use in all
  the multiformats.

    - We have not yet decided on a format yet. When we do, this readme
      will be updated.

    - We have time. All multiformats are far from requiring this varint.

which is not exactly confidence inspiring. They also put the length at
the front of the hash. That's probably convenient if you're parsing an
unknown set of hashes, but I'm not sure it's helpful inside Git objects.
And there's an incentive to minimize header data at the front of a hash,
because every byte is one more byte that every single hash will collide
over, and people will have to type when passing hashes to "git show",
etc.

I'd almost rather use something _really_ verbose like

  sha256:1234abcd...

in all of the objects. And then when we get an unadorned hash from the
user, we guess it's sha256 (or whatever), and fallback to treating it as
a sha1.

Using a syntactically-obvious name like that also solves one other
problem: there are sha1 hashes whose first bytes will encode as a "this
is sha256" multihash, creating some ambiguity.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25 19:26       ` Jeff King
@ 2017-02-25 22:09         ` Mike Hommey
  2017-02-26 17:38           ` brian m. carlson
  0 siblings, 1 reply; 136+ messages in thread
From: Mike Hommey @ 2017-02-25 22:09 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Junio C Hamano, Ian Jackson, Joey Hess, git

On Sat, Feb 25, 2017 at 02:26:56PM -0500, Jeff King wrote:
> On Sat, Feb 25, 2017 at 06:50:50PM +0000, brian m. carlson wrote:
> 
> > > As long as the reader can tell from the format of object names
> > > stored in the "new object format" object from what era is being
> > > referred to in some way [*1*], we can name new objects with only new
> > > hash, I would think.  "new refers only to new" that stratifies
> > > objects into older and newer may make things simpler, but I am not
> > > convinced yet that it would give our users a smooth enough
> > > transition path (but I am open to be educated and pursuaded the
> > > other way).
> > 
> > I would simply use multihash[0] for this purpose.  New-style objects
> > serialize data in multihash format, so it's immediately obvious what
> > hash we're referring to.  That makes future transitions less
> > problematic.
> > 
> > [0] https://github.com/multiformats/multihash
> 
> I looked at that earlier, because I think it's a reasonable idea for
> future-proofing. The first byte is a "varint", but I couldn't find where
> they defined that format.
> 
> The closest I could find is:
> 
>   https://github.com/multiformats/unsigned-varint
> 
> whose README says:
> 
>   This unsigned varint (VARiable INTeger) format is for the use in all
>   the multiformats.
> 
>     - We have not yet decided on a format yet. When we do, this readme
>       will be updated.
> 
>     - We have time. All multiformats are far from requiring this varint.
> 
> which is not exactly confidence inspiring. They also put the length at
> the front of the hash. That's probably convenient if you're parsing an
> unknown set of hashes, but I'm not sure it's helpful inside Git objects.
> And there's an incentive to minimize header data at the front of a hash,
> because every byte is one more byte that every single hash will collide
> over, and people will have to type when passing hashes to "git show",
> etc.
> 
> I'd almost rather use something _really_ verbose like
> 
>   sha256:1234abcd...
> 
> in all of the objects. And then when we get an unadorned hash from the
> user, we guess it's sha256 (or whatever), and fallback to treating it as
> a sha1.
> 
> Using a syntactically-obvious name like that also solves one other
> problem: there are sha1 hashes whose first bytes will encode as a "this
> is sha256" multihash, creating some ambiguity.

Indeed, multihash only really is interesting when *all* hashes use it.
And obviously, git can't change the existing sha1s.

Mike

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:06   ` Jeff King
  2017-02-24 23:35     ` Jakub Narębski
@ 2017-02-25 22:35     ` Lars Schneider
  2017-02-26  0:46       ` Jeff King
  2017-02-26 18:57     ` Thomas Braun
  2 siblings, 1 reply; 136+ messages in thread
From: Lars Schneider @ 2017-02-25 22:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Jakub Narębski, Joey Hess, git


> On 25 Feb 2017, at 00:06, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> 
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
>> 
>> From what I understand people tried this, and Git doesn't exhibit
>> such problem.  I wonder what assumptions SVN made that were broken...
> 
> To be clear, nobody has generated a sha1 collision in Git yet, and you
> cannot blindly use the shattered PDFs to do so. Git's notion of the
> SHA-1 of an object include the header, so somebody would have to do a
> shattered-level collision search for something that starts with the
> correct "blob 1234\0" header.
> 
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:
> 
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
> 		memcpy(ctx->W, data, len);
> }
> 
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> +	0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> +	0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> +	0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> +	0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
> void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
> {
> 	static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
> 	/* Output hash */
> 	for (i = 0; i < 5; i++)
> 		put_be32(hashout + i * 4, ctx->H[i]);
> +
> +	/* pretend "foo" and "bar" collide */
> +	if (!memcmp(hashout, bar_sha1, 20))
> +		memcpy(hashout, foo_sha1, 20);
> }

That's a good idea! I wonder if it would make sense to setup an 
additional job in TravisCI that patches every Git version with some hash 
collisions and then runs special tests. This way we could ensure Git 
behaves reasonable in case of a collision. E.g. by printing errors and 
not crashing or corrupting the repo. Do you think that would be worth 
the effort?

- Lars

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 17:23   ` Jason Cooper
@ 2017-02-25 23:22     ` ankostis
  0 siblings, 0 replies; 136+ messages in thread
From: ankostis @ 2017-02-25 23:22 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Ian Jackson, Joey Hess, git

On 24 February 2017 at 18:23, Jason Cooper <git@lakedaemon.net> wrote:
> Hi Ian,
>
> On Fri, Feb 24, 2017 at 03:13:37PM +0000, Ian Jackson wrote:
>> Joey Hess writes ("SHA1 collisions found"):
>> > https://shattered.io/static/shattered.pdf
>> > https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
>> >
>> > IIRC someone has been working on parameterizing git's SHA1 assumptions
>> > so a repository could eventually use a more secure hash. How far has
>> > that gotten? There are still many "40" constants in git.git HEAD.
>>
>> I have been thinking about how to do a transition from SHA1 to another
>> hash function.
>>
>> I have concluded that:
>>
>>  * We can should avoid expecting everyone to rewrite all their
>>    history.
>
> Agreed.
>
>>  * Unfortunately, because the data formats (particularly, the commit
>>    header) are not in practice extensible (because of the way existing
>>    code parses them), it is not useful to try generate new data (new
>>    commits etc.) containing both new hashes and old hashes: old
>>    clients will mishandle the new data.
>
> My thought here is:
>
>  a) re-hash blobs with sha256, hardlink to sha1 objects
>  b) create new tree objects which are mirrors of each sha1 tree object,
>     but purely sha256
>  c) mirror commits, but they are also purely sha256
>  d) future PGP signed tags would sign both hashes (or include both?)


IMHO that is a great idea that needs more attention.
You get to keep 2 or more hash-functions for extra security in a PQ world.

And to keep sketches for the future so far,
SHA-3 must be always one of the new hashes.
Actually, you can get rid of SHA-1 completely, and land on Linus's
current sketches for the way ahead.

Thanks,
  Kostis
>
> Which would end up something like:
>
>   .git/
>     \... #usual files
>     \objects
>       \ef
>         \3c39f7522dc55a24f64da9febcfac71e984366
>     \objects-sha2_256
>       \72
>         \604fd2de5f25c89d692b01081af93bcf00d2af34549d8d1bdeb68bc048932
>     \info
>       \...
>     \info-sha2_256
>       \refs #uses sha256 commit identifiers
>
> Basically, keep the sha256 stuff out of the way for legacy clients, and
> new clients will still be able to use it.
>
> There shouldn't be a need to re-sign old signed tags if the underlying
> objects are counter-hashed.  There might need to be some transition
> info, though.
>
> Say a new client does 'git tag -v tags/v3.16' in the kernel tree.  I would
> expect it to check the sha1 hashes, verify the PGP signed tag, and then
> also check the sha256 counter-hashes of the relevant objects.
>
> thx,
>
> Jason.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  0:31                 ` ankostis
@ 2017-02-26  0:16                   ` Jason Cooper
  2017-02-26 17:38                     ` brian m. carlson
  0 siblings, 1 reply; 136+ messages in thread
From: Jason Cooper @ 2017-02-26  0:16 UTC (permalink / raw)
  To: ankostis
  Cc: Junio C Hamano, git, Stefan Beller, David Lang, Ian Jackson,
	Joey Hess

Hi,

On Sat, Feb 25, 2017 at 01:31:32AM +0100, ankostis wrote:
> That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
> All git >= 3.x.x must support at least this one (for naming and
> cross-referencing between objects).

I would stress caution here.  SHA3 has survived the NIST competition,
but that's about it.  It has *not* received nearly as much scrutiny as
SHA2.

SHA2 is a similar construction to SHA1 (Merkle–Damgård [1]) so it makes
sense to be leery of it, but I would argue it's seasoning merits serious
consideration.

Ideally, bless SHA2-384 (minimum) as the next hash.  Five or so years
down the road, if SHA3 is still in good standing, bless it as the next
hash.

thx,

Jason.

[1]
https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25 22:35     ` Lars Schneider
@ 2017-02-26  0:46       ` Jeff King
  2017-02-26 18:22         ` Junio C Hamano
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-26  0:46 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Jakub Narębski, Joey Hess, git

On Sat, Feb 25, 2017 at 11:35:27PM +0100, Lars Schneider wrote:

> > So we don't actually know how Git would behave in the face of a SHA-1
> > collision. It would be pretty easy to simulate it with something like:
> [...]
> 
> That's a good idea! I wonder if it would make sense to setup an 
> additional job in TravisCI that patches every Git version with some hash 
> collisions and then runs special tests. This way we could ensure Git 
> behaves reasonable in case of a collision. E.g. by printing errors and 
> not crashing or corrupting the repo. Do you think that would be worth 
> the effort?

I think it would be interesting to see the results under various
scenarios. I don't know that it would be all that interesting from an
ongoing CI perspective. But we wouldn't know until somebody actually
writes the tests and we see what they do.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  6:10         ` Junio C Hamano
@ 2017-02-26  1:13           ` Jason Cooper
  2017-02-26  5:18             ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Jason Cooper @ 2017-02-26  1:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Jeff King, Ian Jackson, Joey Hess,
	Git Mailing List

Hi Junio,

On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> I was thinking we would need mixed mode support for smoother
> transition, but it now seems to me that the approach to stratify the
> history into old and new is workable.

As someone looking to deploy (and having previously deployed) git in
unconventional roles, I'd like to add one caveat.  The flag day in the
history is great, but I'd like to be able to confirm the integrity of
the old history.

"Counter-hashing" the blobs is easy enough, but the trees, commits and
tags would need to have, iiuc, some sort of cross-reference.  As in my
previous example, "git tag -v v3.16" also checks the counter hash to
further verify the integrity of the history (yes, it *really* needs to
check all of the old hashes, but I'd like to make sure I can do step one
first).

Would there be opposition to counter-hashing the old commits at the flag
day?

thx,

Jason.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26  1:13           ` Jason Cooper
@ 2017-02-26  5:18             ` Jeff King
  2017-02-26 18:30               ` brian m. carlson
  2017-03-02 21:46               ` Brandon Williams
  0 siblings, 2 replies; 136+ messages in thread
From: Jeff King @ 2017-02-26  5:18 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Junio C Hamano, Linus Torvalds, Ian Jackson, Joey Hess,
	Git Mailing List

On Sun, Feb 26, 2017 at 01:13:59AM +0000, Jason Cooper wrote:

> On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > I was thinking we would need mixed mode support for smoother
> > transition, but it now seems to me that the approach to stratify the
> > history into old and new is workable.
> 
> As someone looking to deploy (and having previously deployed) git in
> unconventional roles, I'd like to add one caveat.  The flag day in the
> history is great, but I'd like to be able to confirm the integrity of
> the old history.
> 
> "Counter-hashing" the blobs is easy enough, but the trees, commits and
> tags would need to have, iiuc, some sort of cross-reference.  As in my
> previous example, "git tag -v v3.16" also checks the counter hash to
> further verify the integrity of the history (yes, it *really* needs to
> check all of the old hashes, but I'd like to make sure I can do step one
> first).
> 
> Would there be opposition to counter-hashing the old commits at the flag
> day?

I don't think a counter-hash needs to be embedded into the git objects
themselves. If the "modern" repo format stores everything primarily as
sha-256, say, it will probably need to maintain a (local) mapping table
of sha1/sha256 equivalence. That table can be generated at any time from
the object data (though I suspect we'll keep it up to date as objects
enter the repository).

At the flag day[1], you can make a signed tag with the "correct" mapping
in the tag body (so part of the actual GPG signed data, not referenced
by sha1). Then later you can compare that mapping to the object content
in the repo (or to the local copy of the mapping based on that data).

-Peff

[1] You don't even need to wait until the flag day. You can do it now.
    This is conceptually similar to the git-evtag tool, though it just
    signs the blob contents of the tag's current tree state. Signing the
    whole mapping lets you verify the entirety of history, but of course
    that mapping is quite big: 20 + 32 bytes per object for
    sha1/sha-256, which is ~250MB for the kernel. So you'd probably not
    want to do it more than once.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26  0:16                   ` Jason Cooper
@ 2017-02-26 17:38                     ` brian m. carlson
  2017-02-26 19:11                       ` Linus Torvalds
  0 siblings, 1 reply; 136+ messages in thread
From: brian m. carlson @ 2017-02-26 17:38 UTC (permalink / raw)
  To: Jason Cooper
  Cc: ankostis, Junio C Hamano, git, Stefan Beller, David Lang,
	Ian Jackson, Joey Hess

[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]

On Sun, Feb 26, 2017 at 12:16:07AM +0000, Jason Cooper wrote:
> Hi,
> 
> On Sat, Feb 25, 2017 at 01:31:32AM +0100, ankostis wrote:
> > That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
> > All git >= 3.x.x must support at least this one (for naming and
> > cross-referencing between objects).
> 
> I would stress caution here.  SHA3 has survived the NIST competition,
> but that's about it.  It has *not* received nearly as much scrutiny as
> SHA2.
> 
> SHA2 is a similar construction to SHA1 (Merkle–Damgård [1]) so it makes
> sense to be leery of it, but I would argue it's seasoning merits serious
> consideration.
> 
> Ideally, bless SHA2-384 (minimum) as the next hash.  Five or so years
> down the road, if SHA3 is still in good standing, bless it as the next
> hash.

I don't think we want to be changing hashes that frequently.  Projects
frequently last longer than five years.  I think using a 256-bit hash is
the right choice because it fits on an 80-column screen in hex format.
384-bit hashes do not.  This matters because line wrapping makes
copy-paste hard, and user experience is important.

I've mentioned this on the list earlier, but here are the contenders in
my view:

SHA-256:
  Common, but cryptanalysis has advanced.  Preimage resistance (which is
  even more important than collision resistance) has gotten to 52 of 64
  rounds.  Pseudo-collision attacks are possible against 46 of 64
  rounds.  Slowest option.
SHA-3-256:
  Less common, but has a wide security margin.  Cryptanalysis is
  ongoing, but has not advanced much.  Somewhat to much faster than
  SHA-256, unless you have SHA-256 hardware acceleration (which almost
  nobody does).
BLAKE2b-256:
  Lower security margin, but extremely fast (faster than SHA-1 and even
  MD5).

My recommendation has been for SHA-3-256, because I think it provides
the best tradeoff between security and performance.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25 22:09         ` Mike Hommey
@ 2017-02-26 17:38           ` brian m. carlson
  0 siblings, 0 replies; 136+ messages in thread
From: brian m. carlson @ 2017-02-26 17:38 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Jeff King, Junio C Hamano, Ian Jackson, Joey Hess, git

[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]

On Sun, Feb 26, 2017 at 07:09:44AM +0900, Mike Hommey wrote:
> On Sat, Feb 25, 2017 at 02:26:56PM -0500, Jeff King wrote:
> > I looked at that earlier, because I think it's a reasonable idea for
> > future-proofing. The first byte is a "varint", but I couldn't find where
> > they defined that format.
> > 
> > The closest I could find is:
> > 
> >   https://github.com/multiformats/unsigned-varint
> > 
> > whose README says:
> > 
> >   This unsigned varint (VARiable INTeger) format is for the use in all
> >   the multiformats.
> > 
> >     - We have not yet decided on a format yet. When we do, this readme
> >       will be updated.
> > 
> >     - We have time. All multiformats are far from requiring this varint.
> > 
> > which is not exactly confidence inspiring. They also put the length at
> > the front of the hash. That's probably convenient if you're parsing an
> > unknown set of hashes, but I'm not sure it's helpful inside Git objects.
> > And there's an incentive to minimize header data at the front of a hash,
> > because every byte is one more byte that every single hash will collide
> > over, and people will have to type when passing hashes to "git show",
> > etc.

The multihash spec also says that it's not necessary to implement
varints until we have 127 hashes, and considering that will be in the
far future, I'm quite happy to punt that problem down the road to
someone else[0].

> > I'd almost rather use something _really_ verbose like
> > 
> >   sha256:1234abcd...
> > 
> > in all of the objects. And then when we get an unadorned hash from the
> > user, we guess it's sha256 (or whatever), and fallback to treating it as
> > a sha1.
> > 
> > Using a syntactically-obvious name like that also solves one other
> > problem: there are sha1 hashes whose first bytes will encode as a "this
> > is sha256" multihash, creating some ambiguity.
> 
> Indeed, multihash only really is interesting when *all* hashes use it.
> And obviously, git can't change the existing sha1s.

Well, that's why I said in new objects.  If we're going to default to a
new hash, we can store it inside the object format, but not actually
expose it to the user.

In other words, if we used SHA-256, a tree object would refer to the SHA-1
empty blob as 1114e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 and the
SHA-256 empty blob as
1220473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813,
but user-visible code would parse them as e69d... and 473a... (or as
sha1:e69d and 473a, or something).

There's very little code which actually parses objects, so it's easy
enough to introduce a few new functions to read and write the prefixed
versions within the objects, and leave the rest to work in the same old
user-visible way (or in the way that you've proposed).

Note also that we need some way to distinguish objects in binary form,
since if we mix hashes, we need to be able to read data directly from
pack files and other locations where we serialize data that way.
Multihash would do that, even if we didn't expose that to the user.

[0] And for the record, I'm a maintenance programmer, and I dislike it
when people punt the problem down the road to someone else, because
that's usually me.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26  0:46       ` Jeff King
@ 2017-02-26 18:22         ` Junio C Hamano
  0 siblings, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-26 18:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Jakub Narębski, Joey Hess, git

Jeff King <peff@peff.net> writes:

> On Sat, Feb 25, 2017 at 11:35:27PM +0100, Lars Schneider wrote:
> ...
>> That's a good idea! I wonder if it would make sense to setup an 
>> additional job in TravisCI that patches every Git version with some hash 
>> collisions and then runs special tests.
>
> I think it would be interesting to see the results under various
> scenarios. I don't know that it would be all that interesting from an
> ongoing CI perspective.

I had the same thought.  

I view such a test as a very good validation while we are finishing
up the introduction of new hash and the update to the codepaths that
need to handle both hashes, so I'd expect such a test to be a good
validation measure.  But once that work is concluded, I do not know
if tests in ongoing basis is all that interesting.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26  5:18             ` Jeff King
@ 2017-02-26 18:30               ` brian m. carlson
  2017-03-02 21:46               ` Brandon Williams
  1 sibling, 0 replies; 136+ messages in thread
From: brian m. carlson @ 2017-02-26 18:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Jason Cooper, Junio C Hamano, Linus Torvalds, Ian Jackson,
	Joey Hess, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2304 bytes --]

On Sun, Feb 26, 2017 at 12:18:34AM -0500, Jeff King wrote:
> On Sun, Feb 26, 2017 at 01:13:59AM +0000, Jason Cooper wrote:
> 
> > On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > > I was thinking we would need mixed mode support for smoother
> > > transition, but it now seems to me that the approach to stratify the
> > > history into old and new is workable.
> > 
> > As someone looking to deploy (and having previously deployed) git in
> > unconventional roles, I'd like to add one caveat.  The flag day in the
> > history is great, but I'd like to be able to confirm the integrity of
> > the old history.
> > 
> > "Counter-hashing" the blobs is easy enough, but the trees, commits and
> > tags would need to have, iiuc, some sort of cross-reference.  As in my
> > previous example, "git tag -v v3.16" also checks the counter hash to
> > further verify the integrity of the history (yes, it *really* needs to
> > check all of the old hashes, but I'd like to make sure I can do step one
> > first).
> > 
> > Would there be opposition to counter-hashing the old commits at the flag
> > day?
> 
> I don't think a counter-hash needs to be embedded into the git objects
> themselves. If the "modern" repo format stores everything primarily as
> sha-256, say, it will probably need to maintain a (local) mapping table
> of sha1/sha256 equivalence. That table can be generated at any time from
> the object data (though I suspect we'll keep it up to date as objects
> enter the repository).

I really like this look-aside approach.  I think it makes it really easy
to just rewrite the history internally, but still be able to verify
signed commits and signed tags.  We could even synthesize the blobs and
trees from the new hash versions if we didn't want to store them.

This essentially avoids the need for handling competing hashes in the
same object (and controversy about multihash or other storage
facilities); just specify the new hash in the objects, and look up the
old one in the database if necessary.

This also will be the easiest approach to implement, IMHO.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  1:16         ` Jeff King
@ 2017-02-26 18:55           ` Junio C Hamano
  0 siblings, 0 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-26 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Ian Jackson, Joey Hess, Git Mailing List

Jeff King <peff@peff.net> writes:

> Trees are more difficult, as they don't have any such field. But a valid
> tree does need to start with a mode, so sticking some non-numeric flag
> at the front of the object would work (it breaks backwards
> compatibility, but that's kind of the point).

Just like the object header format does not inherently impose a
maximum length the system can handle on our objects or the number of
mode bits we can use in an entry in the tree object [*1*], the
format in which tags and commits refer to other objects does not
impose what hash is used for these references [*2*].  

The object names in the tree format is an oddball; by being a binary
20-byte field and without any other hint, it does limit us to stick
to SHA-1.

I think the helper functions in tree-walk.h, namely 

	init_tree_desc();
	tree_entry_extract();
	update_tree_entry();

and the associated data structures can be updated to read a tree
object in a new format without affecting the readers too much.  By
having a "I am in a new format" byte at the beginning that cannot be
a valid first byte in the current tree format (non-octal is a good
thing to use here), init_tree_desc() can set things up in the desc
structure to expect that the data that will be read by
tree_entry_extract() and update_tree_entry() are formatted in a new
way, and by varying that "tree-format signature" byte, we can update
the format in the future.

So at the loose-object format level, we may not even need "tree2";
we can view this update in a way similar to the change we did when
we started supporting submodules/gitlinks.  Older Git would have
said "There is an object that is not tree or blob recorded" and
barfed but newer one takes such a tree just fine.  This "we are now
introducing a new hash, and a tree can either have objects all named
by SHA-1 or all new (non SHA-1) hash" update can be treated the same
way, methinks.

The normal flow to write tree objects is (supposed to be) all
contained in cache-tree.c.  As long as we can tell from "struct
object" which hash names the object (i.e. struct object_id may
become an enum and a union), we should be able to use it to convert
objects near the tip of the existing history to new hashes
incrementally. Ideally, the flag-day for one tip of a dag may be
just a matter of

	git commit --allow-empty -m "object name hash update"

without anything else.  The commit by default would want to name
itself with the new hash, which requires it to get its tree named
with the new hash, which may read the old tree and associated blobs
all named with SHA-1, but write_index_as_tree() should be able to
(1) read the tree with its SHA-1 name to learn what is contained;
(2) read the contents of blobs with their SHA-1 names, and compute
their names with the new hash; and (3) write out a containing tree
object in the updated format and named with the new hash.  And that
would give us the tree object named with the new hash that the
command can write into the new commit object on its "tree" line.

[Footnote]

*1* These lengths and mode bits are spelled out in ASCII without any
    fixed length limit for the number of the bytes in this ASCII
    string that represents the length.  The current code may happen
    to read them into unsigned long and unsigned int, which does
    impose limit on the individual reader in the sense that if your
    ulong is only 32-bit, you cannot have an object larger than 4GB.
    But that is not an inherent limit in the format; you can lift it
    by upgrading the reader.

*2* They are also spelled out in ASCII and there is no length limit.
    Existing implementation may happen to assume that they are all
    SHA-1, but the readers and the writers can be updated to allow
    other hashes to be used in a way that does not break existing
    code when we are only using SHA-1 by marking a reference that
    uses new hash distinguishable from SHA-1 references.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-24 23:06   ` Jeff King
  2017-02-24 23:35     ` Jakub Narębski
  2017-02-25 22:35     ` Lars Schneider
@ 2017-02-26 18:57     ` Thomas Braun
  2017-02-26 21:30       ` Jeff King
  2 siblings, 1 reply; 136+ messages in thread
From: Thomas Braun @ 2017-02-26 18:57 UTC (permalink / raw)
  To: Jeff King, Jakub Narębski; +Cc: Joey Hess, git

Am 25.02.2017 um 00:06 schrieb Jeff King:
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:
>
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
>  		memcpy(ctx->W, data, len);
>  }
>  
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> +	0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> +	0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> +	0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> +	0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
>  	static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  	/* Output hash */
>  	for (i = 0; i < 5; i++)
>  		put_be32(hashout + i * 4, ctx->H[i]);
> +
> +	/* pretend "foo" and "bar" collide */
> +	if (!memcmp(hashout, bar_sha1, 20))
> +		memcpy(hashout, foo_sha1, 20);
>  }

While reading about the subject I came across [1]. The author reduced
the hash size to 4bits and then played around with git.

Diff taken from the posting (not my code)
--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

From a noob git-dev perspective this sounds more flexibel.

[1]: http://stackoverflow.com/a/34599081

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26 17:38                     ` brian m. carlson
@ 2017-02-26 19:11                       ` Linus Torvalds
  2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-26 19:11 UTC (permalink / raw)
  To: brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Ian Jackson,
	Joey Hess

On Sun, Feb 26, 2017 at 9:38 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> SHA-256:
>   Common, but cryptanalysis has advanced.  Preimage resistance (which is
>   even more important than collision resistance) has gotten to 52 of 64
>   rounds.  Pseudo-collision attacks are possible against 46 of 64
>   rounds.  Slowest option.
> SHA-3-256:
>   Less common, but has a wide security margin.  Cryptanalysis is
>   ongoing, but has not advanced much.  Somewhat to much faster than
>   SHA-256, unless you have SHA-256 hardware acceleration (which almost
>   nobody does).
> BLAKE2b-256:
>   Lower security margin, but extremely fast (faster than SHA-1 and even
>   MD5).
>
> My recommendation has been for SHA-3-256, because I think it provides
> the best tradeoff between security and performance.

I initially was leaning towards SHA256 because of hw acceleration, but
noticed that the Intel SHA NI instructions that they've talking about
so long don't seem to actually exist anywhere (maybe the Goldmont
Atoms?)

So SHA256 acceleration is mainly an ARM thing, and nobody develops on
ARM because there's effectively no hardware that is suitable for
developers. Even ARM people just use PCs (and they won't be Goldmont
Atoms).

Reduced-round SHA256 may have been broken, but on the other hand it's
been around for a lot longer too, so ...

But yes, SHA3-256 looks like the sane choice. Performance of hashing
is important in the sense that it shouldn't _suck_, but is largely
secondary. All my profiles on real loads (well, *my* real loads) have
shown that zlib performance is actually much more important than SHA1.

Anyway, I don't think we should make the hash choice based on pure
performance concerns - crypto strength first, assuming performance is
"not horrible". SHA3-256 does sound like the best choice.

And no, we should not make extensibility a primary concern. It is
likely that supporting two hashes will make it easier to support three
in the future, but I do not think those kinds of worries should even
be on the radar.

It's *much* more important that we don't waste memory and CPU cycles
on being overly "generic" than some theoretical "but but maybe in
another fifteen years.."

              Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26 18:57     ` Thomas Braun
@ 2017-02-26 21:30       ` Jeff King
  2017-02-27  9:57         ` Geert Uytterhoeven
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-26 21:30 UTC (permalink / raw)
  To: Thomas Braun; +Cc: Jakub Narębski, Joey Hess, git

On Sun, Feb 26, 2017 at 07:57:19PM +0100, Thomas Braun wrote:

> While reading about the subject I came across [1]. The author reduced
> the hash size to 4bits and then played around with git.
> 
> Diff taken from the posting (not my code)
> --- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
> +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
> @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
>     blk_SHA1_Update(ctx, padlen, 8);
> 
>     /* Output hash */
> -   for (i = 0; i < 5; i++)
> -       put_be32(hashout + i * 4, ctx->H[i]);
> +   for (i = 0; i < 1; i++)
> +       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
> +   for (i = 1; i < 5; i++)
> +       put_be32(hashout + i * 4, 0);
>  }

Yeah, that is a lot more flexible for experimenting. Though I'd think
you'd probably want more than 4 bits just to avoid accidental
collisions. Something like 24 bits gives you some breathing space (you'd
expect a random collision after 4096 objects), but it's still easy to
do a preimage attack if you need to.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26 19:11                       ` Linus Torvalds
@ 2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
  2017-02-26 21:52                           ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-02-26 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Ian Jackson,
	Joey Hess

On Sun, Feb 26, 2017 at 8:11 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> But yes, SHA3-256 looks like the sane choice. Performance of hashing
> is important in the sense that it shouldn't _suck_, but is largely
> secondary. All my profiles on real loads (well, *my* real loads) have
> shown that zlib performance is actually much more important than SHA1.

What's the zlib v.s. hash ratio on those profiles? If git is switching
to another hashing function given the developments in faster
compression algorithms (gzip v.s. snappy v.s. zstd v.s. lz4)[1] we'll
probably switch to another compression algorithm sooner than later.

Would compression still be the bottleneck by far with zstd, how about with lz4?

1. https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
@ 2017-02-26 21:52                           ` Jeff King
  2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-26 21:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Linus Torvalds, brian m. carlson, Jason Cooper, ankostis,
	Junio C Hamano, Git Mailing List, Stefan Beller, David Lang,
	Ian Jackson, Joey Hess

On Sun, Feb 26, 2017 at 10:38:35PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 26, 2017 at 8:11 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > But yes, SHA3-256 looks like the sane choice. Performance of hashing
> > is important in the sense that it shouldn't _suck_, but is largely
> > secondary. All my profiles on real loads (well, *my* real loads) have
> > shown that zlib performance is actually much more important than SHA1.
> 
> What's the zlib v.s. hash ratio on those profiles? If git is switching
> to another hashing function given the developments in faster
> compression algorithms (gzip v.s. snappy v.s. zstd v.s. lz4)[1] we'll
> probably switch to another compression algorithm sooner than later.
> 
> Would compression still be the bottleneck by far with zstd, how about with lz4?
> 
> 1. https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/

zstd does help in normal operations that access lots of blobs. Here are
some timings:

  http://public-inbox.org/git/20161023080552.lma2v6zxmyaiiqz5@sigill.intra.peff.net/

Compression is part of the on-the-wire packfile format, so it introduces
compatibility headaches. Unlike the hash, it _can_ be a local thing
negotiated between the two ends, and a server with zstd data could
convert on-the-fly to zlib. You just wouldn't want to do so on a server
because it's really expensive (or you double your cache footprint to
store both).

If there were a hash flag day, we _could_ make sure all post-flag-day
implementations have zstd, and just start using that (it transparently
handles old zlib data, too). I'm just hesitant to through in the kitchen
sink and make the hash transition harder than it already is.

Hash performance doesn't matter much for normal read operations. If your
implementation is really _slow_ it does matter for a few operations
(notably index-pack receiving a large push or fetch). Some timings:

  http://public-inbox.org/git/20170223230621.43anex65ndoqbgnf@sigill.intra.peff.net/

If the new algorithm is faster than SHA-1, that might be measurable in
those operations, too, but obviously less dramatic, as hashing is just a
percentage of the total operation (so it can balloon the time if it's
slow, but optimizing it can only save so much).

I don't know if the per-hash setup cost of any of the new algorithms is
higher than SHA-1. We care as much about hashing lots of small content
as we do about sustained throughput of a single hash.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26 21:30       ` Jeff King
@ 2017-02-27  9:57         ` Geert Uytterhoeven
  2017-02-27 10:43           ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Geert Uytterhoeven @ 2017-02-27  9:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Braun, Jakub Narębski, Joey Hess, Git Mailing List

On Sun, Feb 26, 2017 at 10:30 PM, Jeff King <peff@peff.net> wrote:
> On Sun, Feb 26, 2017 at 07:57:19PM +0100, Thomas Braun wrote:
>> While reading about the subject I came across [1]. The author reduced
>> the hash size to 4bits and then played around with git.
>>
>> Diff taken from the posting (not my code)
>> --- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
>> +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
>> @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
>>     blk_SHA1_Update(ctx, padlen, 8);
>>
>>     /* Output hash */
>> -   for (i = 0; i < 5; i++)
>> -       put_be32(hashout + i * 4, ctx->H[i]);
>> +   for (i = 0; i < 1; i++)
>> +       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
>> +   for (i = 1; i < 5; i++)
>> +       put_be32(hashout + i * 4, 0);
>>  }
>
> Yeah, that is a lot more flexible for experimenting. Though I'd think
> you'd probably want more than 4 bits just to avoid accidental
> collisions. Something like 24 bits gives you some breathing space (you'd
> expect a random collision after 4096 objects), but it's still easy to
> do a preimage attack if you need to.

Just shortening the hash causes lots of collisions between objects of
different types. While it's valuable to test git behavior for those cases, you
probably want some way to explicitly test collisions that do not change
the object type, as they're not trivial to detect.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-27  9:57         ` Geert Uytterhoeven
@ 2017-02-27 10:43           ` Jeff King
  2017-02-27 12:39             ` Morten Welinder
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-27 10:43 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Thomas Braun, Jakub Narębski, Joey Hess, Git Mailing List

On Mon, Feb 27, 2017 at 10:57:37AM +0100, Geert Uytterhoeven wrote:

> > Yeah, that is a lot more flexible for experimenting. Though I'd think
> > you'd probably want more than 4 bits just to avoid accidental
> > collisions. Something like 24 bits gives you some breathing space (you'd
> > expect a random collision after 4096 objects), but it's still easy to
> > do a preimage attack if you need to.
> 
> Just shortening the hash causes lots of collisions between objects of
> different types. While it's valuable to test git behavior for those cases, you
> probably want some way to explicitly test collisions that do not change
> the object type, as they're not trivial to detect.

Right, that's why I'm suggesting to make a longer truncation so that
you don't get accidental collisions, but can still find a few specific
ones for your testing.

24 bits is enough to make toy repositories. If you wanted to store a
real repository with the truncated sha1s, you might use 36 bits (that's
9 hex characters, which is enough for git.git to avoid any accidental
collisions). But you can still find a collision via brute force in 2^18
tries, which is not so bad.

I.e., something like:

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 22b125cf8..9158e39ed 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -233,6 +233,10 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
 
 void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
 {
+	/* copy out only the first 36 bits */
+	static const uint32_t mask_bits[5] = {
+		0xffffffff, 0xf0000000
+	};
 	static const unsigned char pad[64] = { 0x80 };
 	unsigned int padlen[2];
 	int i;
@@ -247,5 +251,5 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
 
 	/* Output hash */
 	for (i = 0; i < 5; i++)
-		put_be32(hashout + i * 4, ctx->H[i]);
+		put_be32(hashout + i * 4, ctx->H[i] & mask_bits[i]);
 }
Build that and make it available as git.broken, and then feed your repo
into it, like:

  git init --bare fake.git
  git fast-export HEAD | git.broken -C fake.git fast-import

at which point you have an alternate-universe version of the repository,
which you can operate on as usual with your git.broken tool.

And then you can come up with collisions via brute force:

  # hack to convince hash-object to do lots of sha1s in a single
  # invocation
  N=300000
  for i in $(seq $N); do
    echo $i >$i
  done
  seq 300000 | git.broken hash-object --stdin-paths >hashes

  for collision in $(sort hashes | uniq -d); do
	grep -n $collision hashes
  done

The result is that "33713\n" and "170653\n" collide. So you can now add
those to your fake.git repository and watch the chaos ensue.

-Peff

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-27 10:43           ` Jeff King
@ 2017-02-27 12:39             ` Morten Welinder
  0 siblings, 0 replies; 136+ messages in thread
From: Morten Welinder @ 2017-02-27 12:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Geert Uytterhoeven, Thomas Braun, Jakub Narębski, Joey Hess,
	Git Mailing List

Just swap in md5 in place of sha1.  Pad with '0'.  That'll give you
all the collisions you want and none of those you don't want.

On Mon, Feb 27, 2017 at 5:43 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Feb 27, 2017 at 10:57:37AM +0100, Geert Uytterhoeven wrote:
>
>> > Yeah, that is a lot more flexible for experimenting. Though I'd think
>> > you'd probably want more than 4 bits just to avoid accidental
>> > collisions. Something like 24 bits gives you some breathing space (you'd
>> > expect a random collision after 4096 objects), but it's still easy to
>> > do a preimage attack if you need to.
>>
>> Just shortening the hash causes lots of collisions between objects of
>> different types. While it's valuable to test git behavior for those cases, you
>> probably want some way to explicitly test collisions that do not change
>> the object type, as they're not trivial to detect.
>
> Right, that's why I'm suggesting to make a longer truncation so that
> you don't get accidental collisions, but can still find a few specific
> ones for your testing.
>
> 24 bits is enough to make toy repositories. If you wanted to store a
> real repository with the truncated sha1s, you might use 36 bits (that's
> 9 hex characters, which is enough for git.git to avoid any accidental
> collisions). But you can still find a collision via brute force in 2^18
> tries, which is not so bad.
>
> I.e., something like:
>
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..9158e39ed 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -233,6 +233,10 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
>
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
> +       /* copy out only the first 36 bits */
> +       static const uint32_t mask_bits[5] = {
> +               0xffffffff, 0xf0000000
> +       };
>         static const unsigned char pad[64] = { 0x80 };
>         unsigned int padlen[2];
>         int i;
> @@ -247,5 +251,5 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>
>         /* Output hash */
>         for (i = 0; i < 5; i++)
> -               put_be32(hashout + i * 4, ctx->H[i]);
> +               put_be32(hashout + i * 4, ctx->H[i] & mask_bits[i]);
>  }
> Build that and make it available as git.broken, and then feed your repo
> into it, like:
>
>   git init --bare fake.git
>   git fast-export HEAD | git.broken -C fake.git fast-import
>
> at which point you have an alternate-universe version of the repository,
> which you can operate on as usual with your git.broken tool.
>
> And then you can come up with collisions via brute force:
>
>   # hack to convince hash-object to do lots of sha1s in a single
>   # invocation
>   N=300000
>   for i in $(seq $N); do
>     echo $i >$i
>   done
>   seq 300000 | git.broken hash-object --stdin-paths >hashes
>
>   for collision in $(sort hashes | uniq -d); do
>         grep -n $collision hashes
>   done
>
> The result is that "33713\n" and "170653\n" collide. So you can now add
> those to your fake.git repository and watch the chaos ensue.
>
> -Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Transition plan for git to move to a new hash function
  2017-02-26 21:52                           ` Jeff King
@ 2017-02-27 13:00                             ` Ian Jackson
  2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
                                                 ` (2 more replies)
  0 siblings, 3 replies; 136+ messages in thread
From: Ian Jackson @ 2017-02-27 13:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Linus Torvalds,
	brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Joey Hess

I said I was working on a transition plan.  Here it is.  This is
obviously a draft for review, and I have no official status in the git
project.  But I have extensive experience of protocol compatibility
engineering, and I hope this will be helpful.

Ian.

Subject: Transition plan for git to move to a new hash function

BASIC PRINCIPLES
================

We run multiple hashes in parallel.  Each object is named by exactly
one hash.  We define that objects with identical content, but named by
different hash functions, are different objects.

Objects of one hash may refer to objects named by a different hash
function to their own.  Preference rules arrange that normally, new
hash objects refer to other new hash objects.

The intention is that for most projects, the existing SHA-1 based
history will be retained and a new history built on top of it.
(Rewriting is also possible but means a per-project hard switch.)

We extend the textual object name syntax to explicitly name the hash
used.  Every program that invokes git or speaks git protocols will
need to understand the extended object name syntax.

Packfiles need to be extended to be able to contain objects named by
new hash functions.  Blob objects with identical contents but named by
different hash functions would ideally share storage.

Safety catches preferent accidental incorporation into a project of
incompatibly-new objects, or additional deprecatedly-old objects.
This allows for incremental deployment.

TEXTUAL SYNTAX
==============

The object name textual syntax is extended.  The new syntax may be
used in all textual git objects and protocols (commits, tags, command
lines, etc.).

We declare that the object name syntax is henceforth
  [A-Z]+[0-9a-z]+ | [0-9a-f]+
and that names [A-Z].* are deprecated as ref name components.

    Rationale:

      Full backwards compatibility (ie, without updating any software
      that calls git) is impossible, because the hash function needs
      to be evident in the name, so the new names must be disjoint
      from all old SHA-1 names.

      We want a short but extensible syntax.  The syntax should impose
      minimal extra requirements on existing git users.  In most
      contexts where existing git users use hashes, ASCII alphanumeric
      object names will fit.  Use of punctuation such as : or even _
      may give trouble to existing users, who are already using
      such things as delimiters.

      In existing deployments, refnames that differ only in case are
      generally avoided (because they are troublesome on
      case-insensitive filesystems).  And conventionally refnames are
      lower case.  So names starting with an upper case letter will be
      disjoint from most existing ref name components.

      (Note that there is no need to write the uppercase letter to a
      filename in the object store; the object store can use a
      different naming scheme.)

      Even though we probably want to keep using hex, it is a good
      idea to reserve the flexibility to use a more compact encoding,
      while not excessively widening the existing permissible
      character set.

Object names using SHA-1 are represented, in text, as at present.

Object names starting with uppercase ASCII letters H or later refer to
new hash functions.  Programs that use `g<objectname>' should ideally
be changed to show `H<hash>' for hash function `H' rather than
`gH<hash>'.)

    Rationale:

      Object names starting with A-F might look like hex.  G is
      reserved because of the way that many programs write
      `g<objectname>'.

      This gives us 19 new hash function values until we have to
      starting using two-letter hash function prefixes, or decide to
      use A-F after all.

New hash names my be abbreviated, by truncation (just like old
hashes).  The hash function indicator letter must be retained.

Initially we define and assign one new hash function (and textual
object name encoding):

  H<hex>    where <hex> is the BLAKE2b hash of the object
            (in lowercase)

    Note:

      If the git project prefers a different new hash function to
      BLAKE2b, that's fine.  This proposal can even cope with two new
      hash functions in parallel.  One could even choose on a
      per-project basis, or switch back and forth.

      It would also be possible to define a multihash object name,
      where the full object name is the concatenation of two different
      hash function values.  One of the hashes would have to be
      preferred for use when a truncated object name is provided by
      the human user.

We also reserve the following syntax for private experiments:
  E[A-Z]+[0-9a-z]+
We declare that public releases of git will never accept such
object names.

Everywhere in the git object formats and git protocols, a new object
name (with hash function indicator) is permitted where an old object
name is permitted.

A single object may refer to other objects the hash function which
names the object itself, or by other hash functions, in any
combination.  During git operations, hash function namespace
boundaries in the object graph are traversed freely.

Two additional restrictions: a tree object may be referenced only by
objects named by the same hash function as the tree object itself;
and, a tree object may reference only blobs named by the same hash
function.

IMPLEMENTATION REQUIREMENTS
===========================

The object store will need to store objects named by new hashes,
alongside SHA-1 objects.

In binary protocols, where a SHA-1 object name in binary form was
previously used, a new codepoint must be allocated in a containing
structure (eg a new typecode).  Usually, the new-format binary object
will have a new typecode and also an additional name hash indicator,
and it may also need a length field (as new hashes may be of different
lengths).

Whenever a new hash function textual syntax is defined, corresponding
binary format codepoint(s) are assigned.

Implementation details such as the binary format and protocol
specifications, object store layout, and so on, are outside the scope
of this transition plan.

WHILE WE ARE HERE
=================

We should audit the git data formats for inextensible parsers.  That
will make future changes (for whatever purpose) much less painful.

Specific instances I'm aware of:

* Permit commits and tags to contain unexpected header lines.  Ie,
  lines matching /^\w+\ / before the first blank like, where the
  keyword is not recognised.  These should be ignored.

* The signed push certificate format may need reviewing to check that
  there is space for extension.

The test suite should contain tests of these extension possibilities.

A registry (a la IANA) should be created for the extensible namespaces
(eg of header keywords).

Since new objects can be received and understood only by new software,
anyway, it will be safe to insert extension info whenever we generate
objects named by new hash functions.

CHOICE OF HASH FUNCTION
=======================

Whenever objects are created, it is necessary to choose the hash
function which will be used to name it.

Hash functions are partially ordered, from `older' to `newer'
(implicitly, from worse to better).  The ordering is configurable.
For details of the defaults, see _Transition Plan_.

Each ref has, optionally, a hash function hint associated with it.
Ie, a dropping in .git which names a particular hash function, with
the intent that the next objects made for that ref ought to use the
specified hash function.

Commits
-------

A commit is made (by default) as new as the newest of
 (i) each of its parents
 (ii) if applicable, the hash function hint for the ref to which the
     new commit is to be written

Implicitly this normally means that if HEAD refers to a new commit,
further new commits will be generated on top of it.

The hash function naming an origin commit is controlled by the hint
left in .git for the ref named by HEAD (or for HEAD itself, if HEAD is
detached) by git checkout --orphan or git init.

At boundaries between old and new history, new commit(s) will refer to
old parent(s).

Tags
----

A tag is created (by default) to by named by the same hash function as
the object to which it refers.

Trees
-----

Trees are only referenced by objects named by the same hash function
as the tree.

To satisfy this rule, occasionally a tree object named by one hash
must be recursively rewritten into an equivalent tree named by another
hash.

When a tree refers to a commit (ie, a gitlink), it may refer to a
commit named by a different hash function.

Trees generated so that they can be referred to by the index, are
named by the hash function which would name the next commit to be made
on HEAD (see `Commits', above)

    Rationale: we want to avoid new commits and tags relying on weak
    hashes.  But we should avoid demanding that commits be rewritten.

Blobs
----

Blobs are normally referred to by trees.  Trees always refer to blobs
named by the tree's own hash function.

Where a blob is created in other circumstances, the caller should
specify the hash function.

Ref hints
---------

As noted above, each ref may also have a hash function hint associated
with it.  The hint records an intent to switch hash function.

The hash hint is (by default) copied, when the ref value is copied.
So for exmple if `git checkout foo' makes refs/heads/foo out of
refs/remotes/origin/foo, it will copy the hash hint (or lack of one)
from refs/remotes/origin/foo.

Likewise, the hash hint is conveyed by `git fetch' (by default) and
can be updated with `git push' (though this is not done by default).

The ref hash hint may be set explicitly.  That is how an individual
branch is upgraded.  git checkout --orphan sets it to the hash which
names (or the hint of) the previous HEAD.

When a commit is made and stored in a ref, the hash hint for that ref
is removed iff hash naming the commit's is the same as the the hint.

CONFIGURATION - OBJECT STORE BEHAVIOUR
======================================

The object store has configuration to specify which hash functions are
enabled.  Each hash function H has a combination of the following
behaviours, according to configuration:

* Collision behaviour:

  What to do if we encounter an object we already have (eg as part of
  a pack, or with hash-object) but with different contents.

  (a) fail: print a scary message and abort operation (on the basis
    that either (i) the source of the colliding object probably
    intended the preimage that they provided, in which case proceeding
    using our own version is wrong, or (ii) the source is conducting
    (or unwittingly facilitating) an attack).

  (b) tolerate: prefer our own data; print a message, but treat
    the reference as referring to our version of the object.

  In both cases we keep a copy of the second preimage in our .git, for
  forensic purposes.

    Rationale:

       This is used as part of a gradual desupport strategy.  Existing
       history in all existing object stores is safe and cannot be
       corrupted or modified by receiving colliding objects.

       New trees which receive their initial data from a trustworthy
       sender over a trustworthy channel will receive correct data.
       Bad object stores or untrustworthy channels could exploit
       collisions, but not in new regions of the history which are
       presumably using new names.  So even with untrustworthy
       distribution channels, the collisons can only affect
       archaeology.

       Merging previously-unrelated histories does introduce a
       collision hazard, but the collision would have had to have been
       introduced while the colliding hash function was still a live
       hash function in at least one of the two projects.

* Hash function enablement:

  Each hash function is in one of the following states:

  (a) enabled: this hash function is good and available for use

  (b) deprecated (in favour of H2): this hash function is
     available for use, but newly created objects will use another
     hash function instead (specifically, when creating an object,
     this has function is not considered as a candidate; if as a
     result there are no candidate hash functions, we use the
     specified replacement H2).

     Existing refs referring to objects with this hash, with no ref
     hint, are treated as having a ref hint specifying H2.  If no H2
     is specified, the newest hash is used.

  (c) disabled: existing objects using this hash function can be
     accessed, but no such objects can be created or received.
     (again, a replacement may be specified).  This is used both
     initially to prevent unintended upgrade, and later to block the
     introduction of vulnerable data generated by badly configured
     clients.

* Preference ordering:

  As mentioned in `CHOICE OF HASH FUNCTION', there is a configured
  order on hash functions.  This order should be consistent with the
  enablement configuration.

Details of precise configuration option names are beyond the scope of
this document.

Remote protocol
---------------

During protocol negotation, a receiver needs to specify what hashes it
understands, and whether it is prepared to see only a partial view.

When the sender is listing its refs, refs naming objects the receiver
cannot understand are either elided (if the receiver is content with a
partial view), or cause an error.

Equality testing
----------------

Note that semantically identical trees (and blobs) may (now) have
different tree objects because those tree objects might use (and be
named by) different hashes.  So (in some contexts at least) tree
comparison cannot any longer be done by comparing object names; rather
an invocation of git diff is needed, or explicit generation of a tree
object named by the right hash.

TRANSITION PLAN
===============

For brevity I will write `SHA' for hashing with SHA-1, using current
unqualified object names, and `BLAKE' for hasing with BLAKE2b, using
H<hex> object names.

Y<n> means `Year <n> after the start of the transition'.
Please adjust timescales to taste.

I will focus on the default configuration as shipped by git upstream,
and the recommended configuration for hosting providers.

Individual projects, and perhaps individual hosting providers, can
make their own choices, if they are willing to set appropriate
configuration (on clients, and servers).

Y0: Implement all of the above.  Test it.

    Default configuration:
       SHA is enabled
       BLAKE is disabled in trees without working trees
       BLAKE is enabled in trees with working trees
       SHA > BLAKE

    Effects:

    Clients are prepared to process BLAKE data, but it is not
    generated by default and cannot be pushed to servers.

    All old git clients still work.

    Early adopters can start using the new hashes, at the cost of
    compatibility.  Projects that want to rewrite history right away
    can do so.  (In both cases, by setting configuration options.)

Y4: BLAKE by default for new projects.
    Conversion enabled for existing projects.
    Old git software is going to start rotting.

    Default configuration change:
       BLAKE > SHA
       BLAKE enabled (even in trees without working trees)

    Suggested multi-project hosting site configuration change:
       Newly created projects should get BLAKE enabled
       Existing projects should retain BLAKE disabled by default
       Button should be provided to start conversion (see below)

    Effects:

    When creating a new working tree, it starts using BLAKE.

    Servers which have been updated will accept BLAKE.

    Servers which have not been updated to Y4's git will need a small
    configuration change (enabling BLAKE) to cope with the new
    projects that are using BLAKE.

    To convert a project, an administrator (or project owner) would
    set BLAKE to enabled, and SHA to deprecated, on the server(s).  On
    the next pull the server will provide ref hints naming BLAKE,
    which will get copied to the client's HEAD.  So the client is
    infected with BLAKE.

    To convert a project branch-by-branch, the administrator would set
    BLAKE to enabled but leave SHA enabled.  Then each branch retains
    its own hash.  A branch can be converted by pushing a BLAKE commit
    to it, or by setting a ref hint on the server (or on the next
    client to push).

Y6: BLAKE by default for all projects
    Existing projects start being converted infectiously.
    It is hard for a project to stop this happening if any of
     their servers are updated.
    Old git software is firmly stuffed.

    Default configuration change
       SHA deprecated in trees without working trees

    Effects:

    Existing projects are, by default, `converted', as described
    above.

Y8: Clients hate SHA
    Clients insist on trying to convert existing projects
    It is very hard to stop this happening.
    Unrepentant servers start being very hard to use.

    Default configuration change
       SHA deprecated (even in trees without working trees)

    Effects:

    Clients will generate only BLAKE.  Hopefully their server will
    accept this!

Y10: Stop accepting new SHA
    No-one can manage to make new SHA commits

    Default configuration change
       SHA disabled in new trees, except during initial
          `clone', `mirror' and similar

    Effects:

    Existing SHA history is retained, and copied to new clients and
    servers.  But established clients and servers reject any newly
    introduced SHA.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25 19:04   ` brian m. carlson
@ 2017-02-27 13:29     ` René Scharfe
  2017-02-28 13:25       ` brian m. carlson
  0 siblings, 1 reply; 136+ messages in thread
From: René Scharfe @ 2017-02-27 13:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Duy Nguyen, Joey Hess, Git Mailing List

Am 25.02.2017 um 20:04 schrieb brian m. carlson:
>>> So I think that the current scope left is best estimated by the
>>> following command:
>>>
>>>   git grep -P 'unsigned char\s+(\*|.*20)' | grep -v '^Documentation'
>>>
>>> So there are approximately 1200 call sites left, which is quite a bit of
>>> work.  I estimate between the work I've done and other people's
>>> refactoring work (such as the refs backend refactor), we're about 40%
>>> done.
> 
> As a note, I've been working on this pretty much nonstop since the
> collision announcement was made.  After another 27 commits, I've got it
> down from 1244 to 1119.
> 
> I plan to send another series out sometime after the existing series has
> hit next.  People who are interested can follow the object-id-part*
> branches at https://github.com/bk2204/git.

Perhaps the following script can help a bit; it converts local and static
variables in specified files.  It's just a simplistic parser which can get
at least shadowing variables, strings and comments wrong, so its results
need to be reviewed carefully.

I failed to come up with an equivalent Coccinelle patch so far. :-/

René


#!/bin/sh
while test $# -gt 0
do
	file="$1"
	tmp="$file.new"
	test -f "$file" &&
	perl -e '
		use strict;
		my %indent;
		my %old;
		my %new;
		my $in_struct = 0;
		while (<>) {
			if (/^(\s*)}/) {
				my $len = length $1;
				foreach my $key (keys %indent) {
					if ($len < length($indent{$key})) {
						delete $indent{$key};
						delete $old{$key};
						delete $new{$key};
					}
				}
				$in_struct = 0;
			}
			if (!$in_struct and /^(\s*)(static )?unsigned char (\w+)\[20\];$/) {
				my $prefix = "$1$2";
				my $name = $3;
				$indent{$.} = $1;
				$old{$.} = qr/(?<!->)(?<!\.)(?<!-)\b$name\b/;
				$name =~ s/sha1/oid/;
				print $prefix . "struct object_id " . $name . ";\n";
				$new{$.} = $name . ".hash";
				next;
			}
			if (/^(\s*)(static )?struct (\w+ )?\{$/) {
				$in_struct = 1;
			}
			if (!$in_struct and ! /\/\*/) {
				foreach my $key (keys %indent) {
					s/$old{$key}/$new{$key}/g;
				}
			}
			print;
		}
	' "$file" >"$tmp" &&
	mv "$tmp" "$file" ||
	exit 1
	shift
done

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Why BLAKE2?
  2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
@ 2017-02-27 14:37                               ` Markus Trippelsdorf
  2017-02-27 15:42                                 ` Ian Jackson
  2017-02-27 19:26                               ` Transition plan for git to move to a new hash function Tony Finch
  2017-02-28 21:47                               ` brian m. carlson
  2 siblings, 1 reply; 136+ messages in thread
From: Markus Trippelsdorf @ 2017-02-27 14:37 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Joey Hess

On 2017.02.27 at 13:00 +0000, Ian Jackson wrote:
> 
> For brevity I will write `SHA' for hashing with SHA-1, using current
> unqualified object names, and `BLAKE' for hasing with BLAKE2b, using
> H<hex> object names.

Why do you choose BLAKE2? SHA-2 is generally considered still fine and
would be the obvious choice. And if you want to be adventurous then
SHA-3 (Keccak) would be the next logical candidate.

-- 
Markus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Why BLAKE2?
  2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
@ 2017-02-27 15:42                                 ` Ian Jackson
  0 siblings, 0 replies; 136+ messages in thread
From: Ian Jackson @ 2017-02-27 15:42 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Joey Hess

Markus Trippelsdorf writes ("Re: Why BLAKE2?"):
> On 2017.02.27 at 13:00 +0000, Ian Jackson wrote:
> > For brevity I will write `SHA' for hashing with SHA-1, using current
> > unqualified object names, and `BLAKE' for hasing with BLAKE2b, using
> > H<hex> object names.
> 
> Why do you choose BLAKE2? SHA-2 is generally considered still fine and
> would be the obvious choice. And if you want to be adventurous then
> SHA-3 (Keccak) would be the next logical candidate.

I don't have a strong opinion.  Keccak would be fine too.
We should probably avoid SHA-2.

The main point of my posting was not to argue in favour of a
particular hash function :-).

Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
  2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
@ 2017-02-27 19:26                               ` Tony Finch
  2017-02-28 21:47                               ` brian m. carlson
  2 siblings, 0 replies; 136+ messages in thread
From: Tony Finch @ 2017-02-27 19:26 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	brian m. carlson, Jason Cooper, ankostis, Junio C Hamano,
	Git Mailing List, Stefan Beller, David Lang, Joey Hess

Ian Jackson <ijackson@chiark.greenend.org.uk> wrote:

A few questions and one or two suggestions...

> TEXTUAL SYNTAX
> ==============
>
> We also reserve the following syntax for private experiments:
>   E[A-Z]+[0-9a-z]+
> We declare that public releases of git will never accept such
> object names.

Instead of this I would suggest that experimental hash names should have
multi-character prefixes and an easy registration process - rationale:
https://tools.ietf.org/html/rfc6648

> A single object may refer to other objects the hash function which
> names the object itself, or by other hash functions, in any
> combination.

If I understand it correctly, this freedom is greatly restricted later on
in this document, depending on the object type in question. If so, it's
probably worth saying so at this point.

> Commits
> -------
>
> The hash function naming an origin commit is controlled by the hint
> left in .git for the ref named by HEAD (or for HEAD itself, if HEAD is
> detached) by git checkout --orphan or git init.

This confused me for a while - I think you mean "root commit"?

> TRANSITION PLAN
> ===============
>
> Y4: BLAKE by default for new projects.
>
>     When creating a new working tree, it starts using BLAKE.
>
>     Servers which have been updated will accept BLAKE.

Why not allow newhash pushes before making it the default for new
projects? Wouldn't it make sense to get the server side ready some time
before projects start actively using new hashes?

Or is the idea that newhash upgrade is driven from the server?

What's the upgrade process for send-email patch exchange?

Tony.
-- 
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/  -  I xn--zr8h punycode
Fair Isle: Southwest 6 to gale 8, backing east 5 or 6, backing north 6 to gale
8 later. Rough or very rough. Rain or showers. Moderate or good.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-27 13:29     ` René Scharfe
@ 2017-02-28 13:25       ` brian m. carlson
  0 siblings, 0 replies; 136+ messages in thread
From: brian m. carlson @ 2017-02-28 13:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Duy Nguyen, Joey Hess, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]

On Mon, Feb 27, 2017 at 02:29:18PM +0100, René Scharfe wrote:
> Am 25.02.2017 um 20:04 schrieb brian m. carlson:
> >>> So I think that the current scope left is best estimated by the
> >>> following command:
> >>>
> >>>   git grep -P 'unsigned char\s+(\*|.*20)' | grep -v '^Documentation'
> >>>
> >>> So there are approximately 1200 call sites left, which is quite a bit of
> >>> work.  I estimate between the work I've done and other people's
> >>> refactoring work (such as the refs backend refactor), we're about 40%
> >>> done.
> > 
> > As a note, I've been working on this pretty much nonstop since the
> > collision announcement was made.  After another 27 commits, I've got it
> > down from 1244 to 1119.
> > 
> > I plan to send another series out sometime after the existing series has
> > hit next.  People who are interested can follow the object-id-part*
> > branches at https://github.com/bk2204/git.
> 
> Perhaps the following script can help a bit; it converts local and static
> variables in specified files.  It's just a simplistic parser which can get
> at least shadowing variables, strings and comments wrong, so its results
> need to be reviewed carefully.
> 
> I failed to come up with an equivalent Coccinelle patch so far. :-/
> 
> René
> 
> 
> #!/bin/sh
> while test $# -gt 0
> do
> 	file="$1"
> 	tmp="$file.new"
> 	test -f "$file" &&
> 	perl -e '
> 		use strict;
> 		my %indent;
> 		my %old;
> 		my %new;
> 		my $in_struct = 0;
> 		while (<>) {
> 			if (/^(\s*)}/) {
> 				my $len = length $1;
> 				foreach my $key (keys %indent) {
> 					if ($len < length($indent{$key})) {
> 						delete $indent{$key};
> 						delete $old{$key};
> 						delete $new{$key};
> 					}
> 				}
> 				$in_struct = 0;
> 			}
> 			if (!$in_struct and /^(\s*)(static )?unsigned char (\w+)\[20\];$/) {
> 				my $prefix = "$1$2";
> 				my $name = $3;
> 				$indent{$.} = $1;
> 				$old{$.} = qr/(?<!->)(?<!\.)(?<!-)\b$name\b/;
> 				$name =~ s/sha1/oid/;
> 				print $prefix . "struct object_id " . $name . ";\n";
> 				$new{$.} = $name . ".hash";
> 				next;
> 			}
> 			if (/^(\s*)(static )?struct (\w+ )?\{$/) {
> 				$in_struct = 1;
> 			}
> 			if (!$in_struct and ! /\/\*/) {
> 				foreach my $key (keys %indent) {
> 					s/$old{$key}/$new{$key}/g;
> 				}
> 			}
> 			print;
> 		}
> 	' "$file" >"$tmp" &&
> 	mv "$tmp" "$file" ||
> 	exit 1
> 	shift
> done

I'll see how it works.  I'm currently in New Orleans visiting a friend
until Thursday, so I'll have less time than normal to look at these, but
I'll definitely give it a try.

Most of the issue is not the actual conversion, but finding the right
order in which to convert functions.  For example, the object-id-part8
branch on my GitHub account converts parse_object, but
parse_tree_indirect has to be converted before you can do parse_object.
That leads to another handful of patches that have to be done.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-23 23:05                         ` Jeff King
                                             ` (3 preceding siblings ...)
  2017-02-23 23:14                           ` SHA1 collisions found Linus Torvalds
@ 2017-02-28 18:41                           ` Junio C Hamano
  2017-02-28 19:07                             ` Junio C Hamano
  4 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2017-02-28 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

Jeff King <peff@peff.net> writes:

> The first one is 98K. Mail headers may bump it over vger's 100K barrier.
> It's actually the _least_ interesting patch of the 3, because it just
> imports the code wholesale from the other project. But if it doesn't
> make it, you can fetch the whole series from:
>
>   https://github.com/peff/git jk/sha1dc
>
> (By the way, I don't see your version on the list, Linus, which probably
> means it was eaten by the 100K filter).
>
>   [1/3]: add collision-detecting sha1 implementation
>   [2/3]: sha1dc: adjust header includes for git
>   [3/3]: Makefile: add USE_SHA1DC knob

I was lazy so I fetched the above and then added this on top before
I start to play with it.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Tue, 28 Feb 2017 10:39:25 -0800
Subject: [PATCH] sha1dc: resurrect LICENSE file

The upstream releases the contents under the MIT license; the
initial import accidentally omitted its license file.  

Add it back.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sha1dc/LICENSE | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 sha1dc/LICENSE

diff --git a/sha1dc/LICENSE b/sha1dc/LICENSE
new file mode 100644
index 0000000000..4a3e6a1b15
--- /dev/null
+++ b/sha1dc/LICENSE
@@ -0,0 +1,30 @@
+MIT License
+
+Copyright (c) 2017:
+    Marc Stevens
+    Cryptology Group
+    Centrum Wiskunde & Informatica
+    P.O. Box 94079, 1090 GB Amsterdam, Netherlands
+    marc@marc-stevens.nl
+
+    Dan Shumow
+    Microsoft Research
+    danshu@microsoft.com
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
-- 
2.12.0-310-g733d1cbbe2


^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 18:41                           ` Junio C Hamano
@ 2017-02-28 19:07                             ` Junio C Hamano
  2017-02-28 19:20                               ` Jeff King
  2017-02-28 19:34                               ` Linus Torvalds
  0 siblings, 2 replies; 136+ messages in thread
From: Junio C Hamano @ 2017-02-28 19:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

>>   [1/3]: add collision-detecting sha1 implementation
>>   [2/3]: sha1dc: adjust header includes for git
>>   [3/3]: Makefile: add USE_SHA1DC knob
>
> I was lazy so I fetched the above and then added this on top before
> I start to play with it.
>
> -- >8 --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Tue, 28 Feb 2017 10:39:25 -0800
> Subject: [PATCH] sha1dc: resurrect LICENSE file

In a way similar to 8415558f55 ("sha1dc: avoid c99
declaration-after-statement", 2017-02-24), we would want this on
top.

-- >8 --
Subject: sha1dc: avoid 'for' loop initial decl

We write this:

	type i;
	for (i = initial; i < limit; i++)

instead of this:

	for (type i = initial; i < limit; i++)

the latter of which is from c99.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sha1dc/sha1.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index f4e261ae7a..6569b403e9 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -41,7 +41,8 @@
 
 void sha1_message_expansion(uint32_t W[80])
 {
-	for (unsigned i = 16; i < 80; ++i)
+	unsigned i;
+	for (i = 16; i < 80; ++i)
 		W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
 }
 
@@ -49,9 +50,10 @@ void sha1_compression(uint32_t ihv[5], const uint32_t m[16])
 {
 	uint32_t W[80];
 	uint32_t a, b, c, d, e;
+	unsigned i;
 
 	memcpy(W, m, 16 * 4);
-	for (unsigned i = 16; i < 80; ++i)
+	for (i = 16; i < 80; ++i)
 		W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 1);
 
 	a = ihv[0];

^ permalink raw reply related	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 19:07                             ` Junio C Hamano
@ 2017-02-28 19:20                               ` Jeff King
  2017-03-01  8:57                                 ` Dan Shumow
  2017-02-28 19:34                               ` Linus Torvalds
  1 sibling, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-02-28 19:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dan Shumow, Linus Torvalds, Joey Hess, Git Mailing List

On Tue, Feb 28, 2017 at 11:07:37AM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >>   [1/3]: add collision-detecting sha1 implementation
> >>   [2/3]: sha1dc: adjust header includes for git
> >>   [3/3]: Makefile: add USE_SHA1DC knob
> >
> > I was lazy so I fetched the above and then added this on top before
> > I start to play with it.
> >
> > -- >8 --
> > From: Junio C Hamano <gitster@pobox.com>
> > Date: Tue, 28 Feb 2017 10:39:25 -0800
> > Subject: [PATCH] sha1dc: resurrect LICENSE file
> 
> In a way similar to 8415558f55 ("sha1dc: avoid c99
> declaration-after-statement", 2017-02-24), we would want this on
> top.
> 
> -- >8 --
> Subject: sha1dc: avoid 'for' loop initial decl

Yeah, thanks, I had tweaked both that and the license thing locally but
had not pushed it out yet. Both are obvious improvements.

FWIW, I've been in touch with Dan Shumow, one of the authors, who has
been looking at whether we can speed up the sha1dc implementation, or
integrate the checks into the block-sha1 implementation.

Here are a few notes on the earlier timings I posted that came out in
our conversation:

  - the timings I showed earlier were actually openssl versus sha1dc.
    The block-sha1 timings fall somewhere in the middle:

      [running test-sha1 on a fresh linux.git packfile]
      1.347s openssl
      2.079s block-sha1
      4.983s sha1dc

      [index-pack --verify on a fresh git.git packfile]
       6.919s openssl
       9.003s block-sha1
      17.955s sha1dc

    Those are the operations that show off sha1 performance the most.
    The first one is really not even that interesting; it's raw
    sha1 performance. The second one is an actual operation users might
    notice (though not as --verify exactly, but as "index-pack --stdin"
    when receiving a fetch or a push).

    So there's room for improvement, but the gap between block-sha1
    and sha1dc is not quite as big as I showed earlier.

  - Dan timed the sha1dc implementation with and without the collision
    detection enabled. The sha1 implementation is only 1.33x slower than
    block-sha1 (for raw sha1 time). Adding in the detection makes it
    2.6x slower.

    So there's some potential gain from optimizing the sha1
    implementation, but ultimately we may be looking at a 2x slowdown to
    add in the collision detection.

    It doesn't need to happen for _every_ sha1 we compute, but the
    index-pack case is the one that almost certainly _does_ want it,
    because that's when we're admitting remote objects into the
    repository (ditto you'd probably want it for write_sha1_file(),
    since you could be applying a patch from an untrusted source).

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 19:07                             ` Junio C Hamano
  2017-02-28 19:20                               ` Jeff King
@ 2017-02-28 19:34                               ` Linus Torvalds
  2017-02-28 19:52                                 ` Shawn Pearce
  2017-02-28 21:22                                 ` Dan Shumow
  1 sibling, 2 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-28 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Joey Hess, Git Mailing List

On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> In a way similar to 8415558f55 ("sha1dc: avoid c99
> declaration-after-statement", 2017-02-24), we would want this on
> top.

There's a few other simplifications that could be done:

 (1) make the symbols static that aren't used.

     The sha1.h header ends up declaring several things that shouldn't
have been exported.

     I suspect the code may have had some debug mode that got stripped
out from it before making it public (or that was never there, and was
just something the generating code could add).

 (2) get rid of the "safe mode" support.

     That one is meant for non-checking replacements where it
generates a *different* hash for input with the collision fingerpring,
but that's pointless for the git use when we abort on a collision
fingerprint.

I think the first one will show that the sha1_compression() function
isn't actually used, and with the removal of safe-mode I think
sha1_compression_W() also is unused.

Finally, only states 58 and 65 (out of all 80 states) are actually
used, and from what I can tell, the 'maski' value is always 0, so the
looping over 80 state masks is really just a loop over two.

The file has code top *generate* all the 80 sha1_recompression_step()
functions, and I don't think the compiler is smart enough to notice
that only two of them matter.

And because 'maski' is always zero, thisL

   ubc_dv_mask[sha1_dvs[i].maski]

code looks like it might as well just use ubc_dv_mask[0] - in fact the
ubc_dv_mask[] "array" really is just a single-entry array anyway:

   #define DVMASKSIZE 1

so that code has a few oddities in it. It's generated code, which is
probably why.

Basically, some of it could be improved. In particular, the "generate
code for 80 different recompression cases, but only ever use two of
them" really looks like it would blow up the code generation footprint
a lot.

I'm adding Marc Stevens and Dan Shumow to this email (bcc'd, so that
they don't get dragged into any unrelated email threads) in case they
want to comment.

I'm wondering if they perhaps have a cleaned-up version somewhere, or
maybe they can tell me that I'm just full of sh*t and missed
something.

                    Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 19:34                               ` Linus Torvalds
@ 2017-02-28 19:52                                 ` Shawn Pearce
  2017-02-28 22:56                                   ` Linus Torvalds
  2017-02-28 21:22                                 ` Dan Shumow
  1 sibling, 1 reply; 136+ messages in thread
From: Shawn Pearce @ 2017-02-28 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jeff King, Joey Hess, Git Mailing List

On Tue, Feb 28, 2017 at 11:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> In a way similar to 8415558f55 ("sha1dc: avoid c99
>> declaration-after-statement", 2017-02-24), we would want this on
>> top.
>
> There's a few other simplifications that could be done:

Yes, I found and did a number of these when I ported sha1dc to Java
for JGit[1], and it helped recover some of the lost throughput.

[1] https://git.eclipse.org/r/#/c/91852/

>  (1) make the symbols static that aren't used.
>
>      The sha1.h header ends up declaring several things that shouldn't
> have been exported.
>
>      I suspect the code may have had some debug mode that got stripped
> out from it before making it public (or that was never there, and was
> just something the generating code could add).
>
>  (2) get rid of the "safe mode" support.
>
>      That one is meant for non-checking replacements where it
> generates a *different* hash for input with the collision fingerpring,
> but that's pointless for the git use when we abort on a collision
> fingerprint.
>
> I think the first one will show that the sha1_compression() function
> isn't actually used, and with the removal of safe-mode I think
> sha1_compression_W() also is unused.

Correct.

> Finally, only states 58 and 65 (out of all 80 states) are actually
> used,

Yes, at present only states 58 and 65 are used. I cut out support for
other states.

> and from what I can tell, the 'maski' value is always 0, so the
> looping over 80 state masks is really just a loop over two.

Actually, look closer at that loop:

  for (i = 0; sha1_dvs[i].dvType != 0; ++i)
  {
    if ((0 == ctx->ubc_check) || (((uint32_t)(1) << sha1_dvs[i].maskb)
& ubc_dv_mask[sha1_dvs[i].maski]))

Its a loop over all 32 bits looking for which bits are set. Most of
the time few bits if any are set for most message blocks. Changing
this code to find the lowest 1 bit set in ubc_dv_mask[0] provided a
significant improvement in throughput.

The sha1_dvs array is indexed by maskb, so the code can be reduced to:

  while (ubcDvMask != 0) {
    int b = numberOfTrailingZeros(lowestOneBit(ubcDvMask));
    UbcCheck.DvInfo dv = UbcCheck.DV[b];

Or something.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* RE: SHA1 collisions found
  2017-02-28 19:34                               ` Linus Torvalds
  2017-02-28 19:52                                 ` Shawn Pearce
@ 2017-02-28 21:22                                 ` Dan Shumow
  2017-02-28 22:50                                   ` Marc Stevens
  1 sibling, 1 reply; 136+ messages in thread
From: Dan Shumow @ 2017-02-28 21:22 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano
  Cc: Jeff King, Joey Hess, Git Mailing List,
	'marc.stevens@cwi.nl'

[Responses inline]

No need to keep me "bcc'd" (though thanks for the consideration) -- I'm happy to ignore anything I don't want to be pulled into ;-)

Here's a rollup of what needs to be done based on the discussion below:

1) Remove extraneous exports from sha1.h
2) Remove "safe mode" support.
3) Remove sha1_compression_W if it is not needed by the performance improvements.
4) Evaluate logic around storing states and generating recompression states.  Remove defines that bloat code footprint.

Thanks,
Dan

-----Original Message-----
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
Sent: Tuesday, February 28, 2017 11:34 AM
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>; Joey Hess <id@joeyh.name>; Git Mailing List <git@vger.kernel.org>
Subject: Re: SHA1 collisions found

On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> In a way similar to 8415558f55 ("sha1dc: avoid c99 
> declaration-after-statement", 2017-02-24), we would want this on top.

There's a few other simplifications that could be done:

 (1) make the symbols static that aren't used.

     The sha1.h header ends up declaring several things that shouldn't have been exported.

     I suspect the code may have had some debug mode that got stripped out from it before making it public (or that was never there, and was just something the generating code could add).

[danshu] Yes, this is reasonable.  The emphasis of the code, heretofore, had been the illustration of our unavoidable bit condition performance improvement to counter cryptanalysis.  I'm happy to remove the unused stuff from the public header.

 (2) get rid of the "safe mode" support.

     That one is meant for non-checking replacements where it generates a *different* hash for input with the collision fingerpring, but that's pointless for the git use when we abort on a collision fingerprint.

[danshu] Yes, I agree that if you aren't using this it can be taken out.  I believe Marc has some use cases / potentially consumers of this algorithm in mind.  We can move it into separate header/source files for anyone who wants to use it.

I think the first one will show that the sha1_compression() function isn't actually used, and with the removal of safe-mode I think
sha1_compression_W() also is unused.

[danshu]  Some of the performance experiments that I've looked at involve putting the sha1_compression_W(...) back in.  Though, that doesn't look like it's helping.  If it is unused after the performance improvements, we'll take it out, or move it into its own file.

Finally, only states 58 and 65 (out of all 80 states) are actually used, and from what I can tell, the 'maski' value is always 0, so the looping over 80 state masks is really just a loop over two.

[danshu]  So, while looking at performance optimizations, I specifically looked at how much removing storing the intermediate states helps -- And I found that it doesn't seem to make a difference for performance.  My cursory hypothesis is because nothing is waiting on those writes to memory, the code moves on quickly.  That said, it is a bunch of code that is essentially doing nothing and removing that is worthwhile.  Though, partially what we're seeing here is that, as you point out below, we're working with generated code that we want to be general.  Specifically, right now, we're checking only disturbance vectors that we know can be used to efficiently attack the compression function.  It may be the case that further cryptanalysis uncovers more.  We want to have a general enough approach that we can add scanning for new disturbance vectors if they're found later.  Over specializing the code makes that more difficult, as currently the algorithm is data driven, and we don't need to write new code, but rather just add more data to check.  One other note -- the "maski" field of the  dv_info_t struct is not an index to check the state, but rather an index into the mask generated by the ubc check code, so that doesn't pertain to looping over the states.  More on this below.  

The file has code top *generate* all the 80 sha1_recompression_step() functions, and I don't think the compiler is smart enough to notice that only two of them matter.

[danshu] That's a good observation -- We should clean up the unused recompression steps, especially because that will generate a ton of object code.  We should add some logic to only compile the functions that are used.

And because 'maski' is always zero, thisL

   ubc_dv_mask[sha1_dvs[i].maski]

code looks like it might as well just use ubc_dv_mask[0] - in fact the ubc_dv_mask[] "array" really is just a single-entry array anyway:

   #define DVMASKSIZE 1

[danshu]  The idea here is that we are currently checking 32 disturbance vectors with our bit mask.  We're checking 32 DVs, because we have 32 bits of mask that we can use.  The DVs are ordered by their probability of leading to an attack (which is directly correlated to the complexity of finding a collision.)  Several of those DVs correspond to very low probability / high cost attacks, which we wouldn't expect to see in practice.  We just have the space to check, so why not?  However, improvements in cryptanalysis may make those attacks cheaper, in which case, we would potentially want to add more DVs to check, in which case we would expand the number of DVs and the mask.

so that code has a few oddities in it. It's generated code, which is probably why.

[danshu]  Accurate, we're also just trying to be general enough that we can easily add more DVs later if need be.  I don't know how likely that is, certainly the DVs that we're checking now are based on solid conjectures and rigorous analysis of the problem.  Though we don't want to rule out that there will be subsequent cryptanalytic developments later.  Marc can comment more here.

Basically, some of it could be improved. In particular, the "generate code for 80 different recompression cases, but only ever use two of them" really looks like it would blow up the code generation footprint a lot.

I'm adding Marc Stevens and Dan Shumow to this email (bcc'd, so that they don't get dragged into any unrelated email threads) in case they want to comment.

I'm wondering if they perhaps have a cleaned-up version somewhere, or maybe they can tell me that I'm just full of sh*t and missed something.

[danshu]  Naw man, it looks pretty good, modulo a little bit of understandable confusion over 'maski' -- No fake news or alternative facts here ;-)

                    Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
  2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
  2017-02-27 19:26                               ` Transition plan for git to move to a new hash function Tony Finch
@ 2017-02-28 21:47                               ` brian m. carlson
  2017-03-02 18:13                                 ` Ian Jackson
  2 siblings, 1 reply; 136+ messages in thread
From: brian m. carlson @ 2017-02-28 21:47 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	Jason Cooper, ankostis, Junio C Hamano, Git Mailing List,
	Stefan Beller, David Lang, Joey Hess

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]

On Mon, Feb 27, 2017 at 01:00:01PM +0000, Ian Jackson wrote:
> I said I was working on a transition plan.  Here it is.  This is
> obviously a draft for review, and I have no official status in the git
> project.  But I have extensive experience of protocol compatibility
> engineering, and I hope this will be helpful.
> 
> Ian.
> 
> 
> Subject: Transition plan for git to move to a new hash function
> 
> 
> BASIC PRINCIPLES
> ================
> 
> We run multiple hashes in parallel.  Each object is named by exactly
> one hash.  We define that objects with identical content, but named by
> different hash functions, are different objects.

I think this is fine.

> Objects of one hash may refer to objects named by a different hash
> function to their own.  Preference rules arrange that normally, new
> hash objects refer to other new hash objects.

The existing codebase isn't really intended with that in mind.

It's not that I am arguing against this because I think it's a bad idea,
I'm arguing against it because as a contributor, I'm doubtful that this
is easily achievable given the state of the codebase.

> The intention is that for most projects, the existing SHA-1 based
> history will be retained and a new history built on top of it.
> (Rewriting is also possible but means a per-project hard switch.)

I like Peff's suggested approach in which we essentially rewrite history
under the hood, but have a lookup table which looks up the old hash
based on the new hash.  That allows us to refer to old objects, but not
have to share serialized data that mentions both hashes.

Obviously only the SHA-1 versions of old tags and commits will be able
to be validated, but that shouldn't be an issue.  We can hook that code
into a conversion routine that can handle on-the-fly object conversion.

We also can implement (optionally disabled) fallback functionality to
look up old SHA-1 hash names based on the new hash.

> We extend the textual object name syntax to explicitly name the hash
> used.  Every program that invokes git or speaks git protocols will
> need to understand the extended object name syntax.
> 
> Packfiles need to be extended to be able to contain objects named by
> new hash functions.  Blob objects with identical contents but named by
> different hash functions would ideally share storage.
> 
> Safety catches preferent accidental incorporation into a project of
> incompatibly-new objects, or additional deprecatedly-old objects.
> This allows for incremental deployment.

We have a compatibility mechanism already in place: if the
repositoryFormatVersion option is set to 1, but an unknown extension
flag is set, Git will bail out.

For network protocols, we have the server offer a hash=foo extension,
and make the client echo it back, and either bail or convert on the fly.
This makes it fast for new clients, and slow for old clients, which
encourages migration.

We could also store old-style packs for easy fetch by clients.

> TEXTUAL SYNTAX
> ==============
> 
> The object name textual syntax is extended.  The new syntax may be
> used in all textual git objects and protocols (commits, tags, command
> lines, etc.).
> 
> We declare that the object name syntax is henceforth
>   [A-Z]+[0-9a-z]+ | [0-9a-f]+
> and that names [A-Z].* are deprecated as ref name components.

I'd simply say that we have data always be in the new format if it's
available, and tag the old SHA-1 versions instead.  Otherwise, as Peff
pointed out, we're going to be stuck typing a bunch of identical stuff
every time.  Again, this encourages migration.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 21:22                                 ` Dan Shumow
@ 2017-02-28 22:50                                   ` Marc Stevens
  2017-02-28 23:11                                     ` Linus Torvalds
  0 siblings, 1 reply; 136+ messages in thread
From: Marc Stevens @ 2017-02-28 22:50 UTC (permalink / raw)
  To: Dan Shumow, Linus Torvalds, Junio C Hamano
  Cc: Jeff King, Joey Hess, Git Mailing List

You can also keep me in this thread, so we can help or answer any
further questions,
but I also appreciate the feedback on our project.

Like Dan Shumow said, our main focus on the library has been correctness
and then performance.
The entire files ubc_check.c and ubc_check.h are generated based on
cryptanalytic data,
in particular a list of 32 disturbance vectors and their unavoidable
attack conditions,
see https://github.com/cr-marcstevens/sha1collisiondetection-tools.

sha1.c and sha1.h were coded to work for any such generated ubc_check.c
and ubc_check.h.
That means that indeed we might have some superfluous code, once used
for testing, or for generality,
but nothing that should noticeably impact runtime performance.

Because we only have 32 disturbance vectors to check, we have DVMASKSIZE
equal to 1 and maski always 0.
In the more general case when we add disturbance vectors this will not
remain the case.

Of course for dedicated code this can be simplified, and some parts
could be further optimized.

Regarding the recompression functions, the ones needed are given in the
sha1_dvs table,
but also via preprocessor defines that are used to actually only store
needed internal states:
#define DOSTORESTATE58
#define DOSTORESTATE65
For each disturbance vector there is a window of which states you can
start the recompression from,
we've optimized it so there are only 2 unique starting points (58,65)
instead of 32.
These defines should be easy to use to remove superfluous compiled
recompression functions.

Note that as each disturbance vector has its own unique message differences
(leading to different values for ctx->m2), our code does not loop over
just 2 items.
It loops over 32 distinct computations which have either of the 2
starting points.

Finally, thanks for taking a close look at our code,
this helps bringing the library in better shape also for other software
projects.

Best regards,
Marc Stevens

On 2/28/2017 10:22 PM, Dan Shumow wrote:
> [Responses inline]
>
> No need to keep me "bcc'd" (though thanks for the consideration) -- I'm happy to ignore anything I don't want to be pulled into ;-)
>
> Here's a rollup of what needs to be done based on the discussion below:
>
> 1) Remove extraneous exports from sha1.h
> 2) Remove "safe mode" support.
> 3) Remove sha1_compression_W if it is not needed by the performance improvements.
> 4) Evaluate logic around storing states and generating recompression states.  Remove defines that bloat code footprint.
>
> Thanks,
> Dan
>
>
> -----Original Message-----
> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
> Sent: Tuesday, February 28, 2017 11:34 AM
> To: Junio C Hamano <gitster@pobox.com>
> Cc: Jeff King <peff@peff.net>; Joey Hess <id@joeyh.name>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: SHA1 collisions found
>
> On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> In a way similar to 8415558f55 ("sha1dc: avoid c99 
>> declaration-after-statement", 2017-02-24), we would want this on top.
> There's a few other simplifications that could be done:
>
>  (1) make the symbols static that aren't used.
>
>      The sha1.h header ends up declaring several things that shouldn't have been exported.
>
>      I suspect the code may have had some debug mode that got stripped out from it before making it public (or that was never there, and was just something the generating code could add).
>
> [danshu] Yes, this is reasonable.  The emphasis of the code, heretofore, had been the illustration of our unavoidable bit condition performance improvement to counter cryptanalysis.  I'm happy to remove the unused stuff from the public header.
>
>  (2) get rid of the "safe mode" support.
>
>      That one is meant for non-checking replacements where it generates a *different* hash for input with the collision fingerpring, but that's pointless for the git use when we abort on a collision fingerprint.
>
> [danshu] Yes, I agree that if you aren't using this it can be taken out.  I believe Marc has some use cases / potentially consumers of this algorithm in mind.  We can move it into separate header/source files for anyone who wants to use it.
>
> I think the first one will show that the sha1_compression() function isn't actually used, and with the removal of safe-mode I think
> sha1_compression_W() also is unused.
>
> [danshu]  Some of the performance experiments that I've looked at involve putting the sha1_compression_W(...) back in.  Though, that doesn't look like it's helping.  If it is unused after the performance improvements, we'll take it out, or move it into its own file.
>
> Finally, only states 58 and 65 (out of all 80 states) are actually used, and from what I can tell, the 'maski' value is always 0, so the looping over 80 state masks is really just a loop over two.
>
> [danshu]  So, while looking at performance optimizations, I specifically looked at how much removing storing the intermediate states helps -- And I found that it doesn't seem to make a difference for performance.  My cursory hypothesis is because nothing is waiting on those writes to memory, the code moves on quickly.  That said, it is a bunch of code that is essentially doing nothing and removing that is worthwhile.  Though, partially what we're seeing here is that, as you point out below, we're working with generated code that we want to be general.  Specifically, right now, we're checking only disturbance vectors that we know can be used to efficiently attack the compression function.  It may be the case that further cryptanalysis uncovers more.  We want to have a general enough approach that we can add scanning for new disturbance vectors if they're found later.  Over specializing the code makes that more difficult, as currently the algorithm is data driven, and we don't need to write new code, but rather just add more data to check.  One other note -- the "maski" field of the  dv_info_t struct is not an index to check the state, but rather an index into the mask generated by the ubc check code, so that doesn't pertain to looping over the states.  More on this below.  
>
> The file has code top *generate* all the 80 sha1_recompression_step() functions, and I don't think the compiler is smart enough to notice that only two of them matter.
>
> [danshu] That's a good observation -- We should clean up the unused recompression steps, especially because that will generate a ton of object code.  We should add some logic to only compile the functions that are used.
>
> And because 'maski' is always zero, thisL
>
>    ubc_dv_mask[sha1_dvs[i].maski]
>
> code looks like it might as well just use ubc_dv_mask[0] - in fact the ubc_dv_mask[] "array" really is just a single-entry array anyway:
>
>    #define DVMASKSIZE 1
>
> [danshu]  The idea here is that we are currently checking 32 disturbance vectors with our bit mask.  We're checking 32 DVs, because we have 32 bits of mask that we can use.  The DVs are ordered by their probability of leading to an attack (which is directly correlated to the complexity of finding a collision.)  Several of those DVs correspond to very low probability / high cost attacks, which we wouldn't expect to see in practice.  We just have the space to check, so why not?  However, improvements in cryptanalysis may make those attacks cheaper, in which case, we would potentially want to add more DVs to check, in which case we would expand the number of DVs and the mask.
>
> so that code has a few oddities in it. It's generated code, which is probably why.
>
> [danshu]  Accurate, we're also just trying to be general enough that we can easily add more DVs later if need be.  I don't know how likely that is, certainly the DVs that we're checking now are based on solid conjectures and rigorous analysis of the problem.  Though we don't want to rule out that there will be subsequent cryptanalytic developments later.  Marc can comment more here.
>
> Basically, some of it could be improved. In particular, the "generate code for 80 different recompression cases, but only ever use two of them" really looks like it would blow up the code generation footprint a lot.
>
> I'm adding Marc Stevens and Dan Shumow to this email (bcc'd, so that they don't get dragged into any unrelated email threads) in case they want to comment.
>
> I'm wondering if they perhaps have a cleaned-up version somewhere, or maybe they can tell me that I'm just full of sh*t and missed something.
>
> [danshu]  Naw man, it looks pretty good, modulo a little bit of understandable confusion over 'maski' -- No fake news or alternative facts here ;-)
>
>                     Linus



^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 19:52                                 ` Shawn Pearce
@ 2017-02-28 22:56                                   ` Linus Torvalds
  0 siblings, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-02-28 22:56 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, Jeff King, Joey Hess, Git Mailing List

On Tue, Feb 28, 2017 at 11:52 AM, Shawn Pearce <spearce@spearce.org> wrote:
>
>> and from what I can tell, the 'maski' value is always 0, so the
>> looping over 80 state masks is really just a loop over two.
>
> Actually, look closer at that loop:

No, sorry, I wasn't clear and took some shortcuts in writing that made
that sentence not parse right.

There's two issues going on. This loop:

>   for (i = 0; sha1_dvs[i].dvType != 0; ++i)

loops over all the dvs - and then inside it has that useless "maski"
thing as part of the test that is always zero.

But the "80 state masks" was not that "maski' value, but the
"ctx->states[5][80]" thing.

So we have 80 of those 5-word state values, but only two of them are
actually used: iterations 58 and 65). You can see how the code
actually optimizes away (by hand) the SHA1_STORE_STATE() thing by
using DOSTORESTATE58 and DOSTORESTATE65, but it does actually generate
the code for all of them.

You can see the "only two steps" part in this:

  (sha1_recompression_step[sha1_dvs[i].testt])(...)

if you notice how there are only those two different cases for "testt".

So there is code generated for 80 different recompression step
functions in that array, but there are only two different functions
that are actually ever used.

Those are not small functions, either. When I check the build, they
generate about 3.5kB of code each. So it's literally about 250kB of
completely wasted space in the binary.

See what I'm saying? Two different issues. One is that the code
generates tons of (fairly big) functions, and only uses two of them,
the rest are useless and lying around. The other is that it uses a
variable that is only ever zero.

So I think that loop would actually be better not as a loop at all,
but as a "generated code expanded from the dv_data". It would have
been more obvious. Right now it loads values from the array, and it's
not obvious that some of the values it loads are very very limited (to
the point of one of them just being the constant "0").

Anyway, Dan Shumow already answered and addressed both issues (and the
smaller stylistic ones).

               Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 22:50                                   ` Marc Stevens
@ 2017-02-28 23:11                                     ` Linus Torvalds
  2017-03-01 19:05                                       ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-02-28 23:11 UTC (permalink / raw)
  To: Marc Stevens
  Cc: Dan Shumow, Junio C Hamano, Jeff King, Joey Hess,
	Git Mailing List

On Tue, Feb 28, 2017 at 2:50 PM, Marc Stevens <marc.stevens@cwi.nl> wrote:
>
> Because we only have 32 disturbance vectors to check, we have DVMASKSIZE
> equal to 1 and maski always 0.
> In the more general case when we add disturbance vectors this will not
> remain the case.

Ok, I didn't get why that happened, but it makes sense to me now.

> Of course for dedicated code this can be simplified, and some parts
> could be further optimized.

So I'd be worried about changing your tested code too much, since the
only test-cases we have are the two pdf files. If we screw up too
much, those will no longer show as collisions, but we could get tons
of false positives that we wouldn't see, so..

I'm wondering that since the disturbance vector cases themselves are a
fairly limited number, perhaps the code that generates this could be
changed to actually just generate the static calls rather than the
loop over the sha1_dvs[] array.

IOW, instead of generating this:

        for (i = 0; sha1_dvs[i].dvType != 0; ++i) {
                .. use sha1_dvs[i] values as indexes into function arrays etc..
        }

maybe you could just make the generator generate that loop statically,
and have 32 function calls with the masks as constant arguments.

.. together with only generating the SHA1_RECOMPRESS() functions for
the cases that are actually used.

So it would still be entirely generated, but it would generate a
little bit more explicit code.

Of course, we could just edit out all the SHA_RECOMPRESS(x) cases by
hand, and only leave the two that are actually used.

As it is, the lib/sha1.c code generates about 250kB of code that is
never used if I read the code correctly (that's just the sha1.c code -
entirely ignoring all the avx2 etc versions that I haven't looked at,
and that I don't think git would use)

> Regarding the recompression functions, the ones needed are given in the
> sha1_dvs table,
> but also via preprocessor defines that are used to actually only store
> needed internal states:
> #define DOSTORESTATE58
> #define DOSTORESTATE65

Yeah, I guess we could use those #define's to cull the code "automatically".

But I think you could do it at the generation phase easily too, so
that we don't then introduce unnecessary differences when we try to
get rid of the extra fat ;)

> Note that as each disturbance vector has its own unique message differences
> (leading to different values for ctx->m2), our code does not loop over
> just 2 items.
> It loops over 32 distinct computations which have either of the 2
> starting points.

Yes, I already had to clarify my badly expressed writing on the git
list. My concerns were about the (very much not obvious) limited
values in the dvs array.

So the code superficially *looks* like it uses all those functions you
generate (and the maski value _looked_ like it was interesting), but
when looking closer it turns out that there's just a two different
function calls that it loops over (but it loops over them multiple
times, I agree).

                           Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* RE: SHA1 collisions found
  2017-02-28 19:20                               ` Jeff King
@ 2017-03-01  8:57                                 ` Dan Shumow
  0 siblings, 0 replies; 136+ messages in thread
From: Dan Shumow @ 2017-03-01  8:57 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Linus Torvalds, Joey Hess, Git Mailing List

>   - Dan timed the sha1dc implementation with and without the collision
>     detection enabled. The sha1 implementation is only 1.33x slower than
>    block-sha1 (for raw sha1 time). Adding in the detection makes it
>    2.6x slower.

 >    So there's some potential gain from optimizing the sha1
 >    implementation, but ultimately we may be looking at a 2x slowdown to
 >     add in the collision detection.

I rearranged our code a little bit and interleaved the message expansion and rounds.  This bring our raw SHA-1 implementation (without collision detection) down to 1.11x slower than the block-sha1 implementation in Git.  Adding the collision detection brings us to 2.12x slower than the block-sha1 implementation.  This was basically attacking the low hanging fruit in optimizing our implementation.  There are some things that I haven't looked into yet, but I'm basically at the point of starting to compare the generated assembler to see what's different between our implementations.

OpenSSL's SHA1 implementation is implemented in assembler, so there's no way we're going to get close to that with just C level coding.

Thanks,
Dan

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-28 23:11                                     ` Linus Torvalds
@ 2017-03-01 19:05                                       ` Jeff King
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-03-01 19:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marc Stevens, Dan Shumow, Junio C Hamano, Joey Hess,
	Git Mailing List

On Tue, Feb 28, 2017 at 03:11:32PM -0800, Linus Torvalds wrote:

> > Of course for dedicated code this can be simplified, and some parts
> > could be further optimized.
> 
> So I'd be worried about changing your tested code too much, since the
> only test-cases we have are the two pdf files. If we screw up too
> much, those will no longer show as collisions, but we could get tons
> of false positives that we wouldn't see, so..

I can probably help with collecting data for that part on GitHub.

I don't have an exact count of how many sha1 computations we do in a
day, but it's...a lot. Obviously every pushed object gets its sha1
computed, but read operations also cover every commit and tree via
parse_object() (though I think most of the blob reads do not).

So it would be trivial to start by swapping out the "die()" on collision
with something that writes to a log. This is the slow path that we don't
expect to trigger at all, so log volume shouldn't be a problem.

I've been waiting to see how speedups develop before deploying it in
production.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-02-28 21:47                               ` brian m. carlson
@ 2017-03-02 18:13                                 ` Ian Jackson
  2017-03-04 22:49                                   ` brian m. carlson
  0 siblings, 1 reply; 136+ messages in thread
From: Ian Jackson @ 2017-03-02 18:13 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	Jason Cooper, ankostis, Junio C Hamano, Git Mailing List,
	Stefan Beller, David Lang, Joey Hess

brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> On Mon, Feb 27, 2017 at 01:00:01PM +0000, Ian Jackson wrote:
> > Objects of one hash may refer to objects named by a different hash
> > function to their own.  Preference rules arrange that normally, new
> > hash objects refer to other new hash objects.
> 
> The existing codebase isn't really intended with that in mind.

Yes.  I've seen the attempts to start to replace char* with a hash
struct.

> I like Peff's suggested approach in which we essentially rewrite history
> under the hood, but have a lookup table which looks up the old hash
> based on the new hash.  That allows us to refer to old objects, but not
> have to share serialized data that mentions both hashes.

I think this means that the when a project converts, every copy of the
history must be rewritten (separately).  Also, this leaves the whole
system lacking in algorithm agililty.  Meaning we may have to do all
of this again some time.

I also think that we need to distinguish old hashes from new hashes in
the command line interface etc.  Otherwise there is a possible
ambiguity.

> > The object name textual syntax is extended.  The new syntax may be
> > used in all textual git objects and protocols (commits, tags, command
> > lines, etc.).
> > 
> > We declare that the object name syntax is henceforth
> >   [A-Z]+[0-9a-z]+ | [0-9a-f]+
> > and that names [A-Z].* are deprecated as ref name components.
> 
> I'd simply say that we have data always be in the new format if it's
> available, and tag the old SHA-1 versions instead.  Otherwise, as Peff
> pointed out, we're going to be stuck typing a bunch of identical stuff
> every time.  Again, this encourages migration.

The hash identifier is only one character.  Object names are not
normally typed very much anyway.

If you say we must decorate old hashes, then all existing data
everywhere in the world which refers to any git objects by object name
will become invalid.  I don't mean just data in git here.  I mean CI
systems, mailing list archives, commit messages (perhaps in other
version control systems), test cases, and so on.

Ian.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-25  0:39       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2017-02-25  6:10         ` Junio C Hamano
@ 2017-03-02 19:55         ` Linus Torvalds
  2017-03-02 20:43           ` Junio C Hamano
                             ` (2 more replies)
  3 siblings, 3 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-03-02 19:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, Git Mailing List

On Fri, Feb 24, 2017 at 4:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Honestly, I think that a primary goal for a new hash implementation
> absolutely needs to be to minimize mixing.
>
> Not for security issues, but because of combinatorics. You want to
> have a model that basically reads old data, but that very aggressively
> approaches "new data only" in order to avoid the situation where you
> have basically the exact same tree state, just _represented_
> differently.
>
> For example, what I would suggest the rules be is something like this:

Hmm. Having looked at this a fair amount, and in particularly having
looked at the code as part of the hash typesafety patch I did, I am
actually starting to think that it would not be too painful at all to
have a totally different approach, which might be a lot easier to do.

So bear with me, let me try to explain my thinking:

 (a) if we want to be backwards compatible and not force people to
convert their trees on some flag day, we're going to be stuck with
having to have the SHA1 code around and all the existing object
parsing for basically forever

Now, this basically means that the _easy_ solution would be that we
just do the flag day, switch to sha-256, extend everything to 32-byte
hashes, and just have a "git2 fast-import" that makes it easy to
convert stuff.

But it really would be a completely different version of git, with a
new pack-file format and no real compatibility. Such a flag-day
approach would certainly have advantages: it would allow for just
re-architecting some bad choices:

 - make the hashing be something that can be threaded (ie maybe we can
just block it up in 4MB chunks that you can hash in parallel, and make
the git object hash be the hash of hashes)

 - replace zlib with something like zstd

 - get rid of old pack formats etc.

but  on the whole, I still think that the compatibility would be worth
much more than the possible technical advantages of a clean slate
restart.

 (b) the SHA1 hash is actually still quite strong, and the collision
detection code effectively means that we don't really have to worry
about collisions in the immediate future.

In other words, the mitigation of the current attack is actually
really easy technically (modulo perhaps the performance concerns), and
there's still nothing fundamentally wrong with using SHA1 as a content
hash. It's still a great hash.

Now, my initial reaction (since it's been discussed for so long
anyway) was obviously "pick a different hash". That was everybody's
initial reaction, I think.

But I'm starting to think that maybe that initial obvious reaction was wrong.

The thing that makes collision attacks so nasty is that our reaction
to a collision is so deadly.  But that's not necessarily fundamental:
we certainly uses hashes with collisions every day, and they work
fine. And they work fine because the code that uses those hashes is
designed to simply deal gracefully - although very possibly with a
performance degradation - with two different things hashing to the
same bucket.

So what if the solution to "SHA1 has potential collisions" is "any
hash can have collisions in theory, let's just make sure we handle
them gracefully"?

Because I'm looking at our data structures that have hashes in them,
and many of them are actually of the type where I go

  "Hmm..  Replacing the hash entirely is really really painful - but
it wouldn't necessarily be all that nasty to extend the format to have
additional version information".

and the advantage of that approach is that it actually makes the
compatibility part trivial. No more "have to pick a different hash and
two different formats", and more of a "just the same format with
extended information that might not be there for old objects".

So we have a few different types of formats:

 - the purely local data structures: the pack index file, the file
index, our refs etc

   These we could in change completely, and it wouldn't even be all
that painful. The pack index has already gone through versions, and it
doesn't affect anything else.

 - the actual objects.

   These are fairly painful to change, particularly things like the
"tree" object which is probably the worst designed of the lot. Making
it contain a fixed-size binary thing was really a nasty mistake. My
bad.

 - the pack format and the protocol to exchange "I have this" information

   This is *really* painful to change, because it contains not just
the raw object data, but it obviously ends up being the wire format
for remote accesses.

and it turns out that *all* of these formats look like they would be
fairly easy to extend to having extra object version information. Some
of that extra object version information we already have and don't
use, in fact.

Even the tree format, with the annoying fixed-size binary blob. Yes,
it has that fixed size binary blob, but it has it in _addition_ to the
ASCII textual form that would be really easy to just extend upon. We
have that "tree entry type" that we've already made extensions with by
using it for submodules. It would be quite easy to just say that a
tree entry also has a "file version" field, so that you can have
multiple objects that just hash to the same SHA1, and git wouldn't
even *care*.

The transfer protocol is the same: yes, we transfer hashes around, but
it would not be all that hard to extend it to "transfer hash and
object version".

And the difference is that then the "backwards compatibility" part
just means interacting with somebody who didn't know to transfer the
object version. So suddenly being backwards compatible isn't a whole
different object parsing thing, it's just a small extension.

IOW, we could make it so that the SHA1 is just a hash into a list of
objects. Even the pack index format wouldn't need to change - right
now we assume that an index hit gives us the direct pointer into the
pack file, but we *could* just make it mean that it gives us a direct
pointer to the first object in the pack file with that SHA1 hash.
Exactly like you'd normally use a hash table with linear probing.

Linear probing is usually considered a horrible approach to hash
tables,. but it's actually a really useful one for the case where
collisions are very rare.

Anyway, I do have a suggestion for what the "object version" would be,
but I'm not even going to mention it, because I want people to first
think about the _concept_ and not the implementation.

So: What do you think about the concept?

               Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 19:55         ` Linus Torvalds
@ 2017-03-02 20:43           ` Junio C Hamano
  2017-03-02 21:21             ` Linus Torvalds
  2017-03-03 11:04           ` Jeff King
  2017-03-03 21:47           ` Stefan Beller
  2 siblings, 1 reply; 136+ messages in thread
From: Junio C Hamano @ 2017-03-02 20:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Ian Jackson, Joey Hess, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Anyway, I do have a suggestion for what the "object version" would be,
> but I'm not even going to mention it, because I want people to first
> think about the _concept_ and not the implementation.
>
> So: What do you think about the concept?

My reaction heavily depends on how that "object version" thing
works.  When I think I have "variant #1" of an object and say

  have 860cd699c285f02937a2edbdb78e8231292339a5#1

is there any guarantee that the other end has a (small) set of
different objects all sharing the same SHA-1 and it thinks it has
"variant #1" only when it has the same thing as I have (otherwise,
it may have "variant #2" that is an unrelated object but happens to
share the same hash)?  If so, I think I understand how things would
work within your "concept".  But otherwise, I am not really sure.

Would "object version" be like a truncated SHA-1 over the same data
but with different IV or something, i.e. something that guarantees
anybody would get the same result given the data to be hashed?

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 20:43           ` Junio C Hamano
@ 2017-03-02 21:21             ` Linus Torvalds
  2017-03-02 21:54               ` Joey Hess
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-03-02 21:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ian Jackson, Joey Hess, Git Mailing List

On Thu, Mar 2, 2017 at 12:43 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> My reaction heavily depends on how that "object version" thing
> works.
>
> Would "object version" be like a truncated SHA-1 over the same data
> but with different IV or something, i.e. something that guarantees
> anybody would get the same result given the data to be hashed?

Yes, it does need to be that in practice. So what I was thinking the
object version would be is:

 (a) we actually take the object type into account explicitly.

 (b) we explicitly add another truncated hash.

The first part we can already do without any actual data structure
changes, since basically all users already know the type of an object
when they look it up.

So we already have information that we could use to narrow down the
hash collision case if we saw one.

There are some (very few) cases where we don't already explicitly have
the object type (a tag reference can be any object, for example, and
existing scripts might ask for "give me the type of this SHA1 object
with "git cat-file -t"), but that just goes back to the whole "yeah,
we'll handle legacy uses and we will look up objects even _without_
the extra version data, so it actually integrates well into the whole
notion.

Basically, once you accept that "hey, we'll just have a list of
objects with that hash", it just makes sense to narrow it down by the
object type we also already have.

But yes, the object type is obviously only two bits of information
(actually, considering the type distribution, probably just one bit),
and it's already encoded in the first hash, so it doesn't actually
help much as "collision avoidance" particularly once you have a
particular attack against that hash in place.

It's just that it *is* extra information that we already have, and
that is very natural to use once you start thinking of the hash lookup
as returning a list of objects. It also mitigates one of the worst
_confusions_ in git, and so basically mitigates the worst-case
downside of an attack basically for free, so it seems like a
no-brainer.

But the real new piece of object version would be a truncated second
hash of the object.

I don't think it matters too much what that second hash is, I would
say that we'd just approximate having a total of 256 bits of hash.

Since we already have basically 160 bits of fairly good hashing, and
roughly 128 bits of that isn't known to be attackable, we'd just use
another hash and truncate that to 128 bits. That would be *way*
overkill in practice, but maybe overkill is what we want. And it
wouldn't really expand the objects all that much more than just
picking a new 256-bit hash would do.

So you'd have to be able to attack both the full SHA1, _and_ whatever
other different good hash to 128 bits.

                Linus

PS.  if people think that SHA1 is of a good _size_, and only worry
about the known weaknesses of the hashing itself, we'd only need to
get back the bits that the attacks take away from brute force. That's
currently the 80 -> ~63 bits attack, so you'd really only want about
40 bits of second hash to claw us back back up to 80 bits of brute
force (again: brute force is basically sqrt() of the search space, so
half the bits, so adding 40 bits of hash adds 20 bits to the brute
force cost and you'd get back up to the 2**80 we started with).

So 128 bits of secondary hash really is much more than we'd need. 64
bits would probably be fine.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-02-26  5:18             ` Jeff King
  2017-02-26 18:30               ` brian m. carlson
@ 2017-03-02 21:46               ` Brandon Williams
  2017-03-03 11:13                 ` Jeff King
  1 sibling, 1 reply; 136+ messages in thread
From: Brandon Williams @ 2017-03-02 21:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Jason Cooper, Junio C Hamano, Linus Torvalds, Ian Jackson,
	Joey Hess, Git Mailing List

On 02/26, Jeff King wrote:
> On Sun, Feb 26, 2017 at 01:13:59AM +0000, Jason Cooper wrote:
> 
> > On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > > I was thinking we would need mixed mode support for smoother
> > > transition, but it now seems to me that the approach to stratify the
> > > history into old and new is workable.
> > 
> > As someone looking to deploy (and having previously deployed) git in
> > unconventional roles, I'd like to add one caveat.  The flag day in the
> > history is great, but I'd like to be able to confirm the integrity of
> > the old history.
> > 
> > "Counter-hashing" the blobs is easy enough, but the trees, commits and
> > tags would need to have, iiuc, some sort of cross-reference.  As in my
> > previous example, "git tag -v v3.16" also checks the counter hash to
> > further verify the integrity of the history (yes, it *really* needs to
> > check all of the old hashes, but I'd like to make sure I can do step one
> > first).
> > 
> > Would there be opposition to counter-hashing the old commits at the flag
> > day?
> 
> I don't think a counter-hash needs to be embedded into the git objects
> themselves. If the "modern" repo format stores everything primarily as
> sha-256, say, it will probably need to maintain a (local) mapping table
> of sha1/sha256 equivalence. That table can be generated at any time from
> the object data (though I suspect we'll keep it up to date as objects
> enter the repository).
> 
> At the flag day[1], you can make a signed tag with the "correct" mapping
> in the tag body (so part of the actual GPG signed data, not referenced
> by sha1). Then later you can compare that mapping to the object content
> in the repo (or to the local copy of the mapping based on that data).
> 
> -Peff
> 
> [1] You don't even need to wait until the flag day. You can do it now.
>     This is conceptually similar to the git-evtag tool, though it just
>     signs the blob contents of the tag's current tree state. Signing the
>     whole mapping lets you verify the entirety of history, but of course
>     that mapping is quite big: 20 + 32 bytes per object for
>     sha1/sha-256, which is ~250MB for the kernel. So you'd probably not
>     want to do it more than once.

There were a few of us discussing this sort of approach internally.  We
also figured that, given some performance hit, you could maintain your
repo in sha256 and do some translation to sha1 if you need to push or
fetch to a server which has the the repo in a sha1 format.  This way you
can convert your repo independently of the rest of the world.

As for storing the translation table, you should really only need to
maintain the table until old clients are phased out and all of the repos
of a project have experienced flag day and have been converted to
sha256.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 21:21             ` Linus Torvalds
@ 2017-03-02 21:54               ` Joey Hess
  2017-03-02 22:27                 ` Linus Torvalds
  0 siblings, 1 reply; 136+ messages in thread
From: Joey Hess @ 2017-03-02 21:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jeff King, Ian Jackson, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 513 bytes --]

Linus Torvalds wrote:
> So you'd have to be able to attack both the full SHA1, _and_ whatever
> other different good hash to 128 bits.

There's a surprising result of combining iterated hash functions, that
the combination is no more difficult to attack than the strongest hash
function used.

https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf

Perhaps you already knew about this, but I had only heard rumors
that was the case, until I found that reference recently.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 21:54               ` Joey Hess
@ 2017-03-02 22:27                 ` Linus Torvalds
  2017-03-03  1:50                   ` Mike Hommey
  0 siblings, 1 reply; 136+ messages in thread
From: Linus Torvalds @ 2017-03-02 22:27 UTC (permalink / raw)
  To: Joey Hess; +Cc: Junio C Hamano, Jeff King, Ian Jackson, Git Mailing List

On Thu, Mar 2, 2017 at 1:54 PM, Joey Hess <id@joeyh.name> wrote:
>
> There's a surprising result of combining iterated hash functions, that
> the combination is no more difficult to attack than the strongest hash
> function used.

Duh. I should actually have known that. I started reading the paper
and went "this seems very familiar". I'm pretty sure I've been pointed
at that paper before (or maybe just a similar one), and I just didn't
react enough for it to leave a lasting impact.

              Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 22:27                 ` Linus Torvalds
@ 2017-03-03  1:50                   ` Mike Hommey
  2017-03-03  2:19                     ` Linus Torvalds
  0 siblings, 1 reply; 136+ messages in thread
From: Mike Hommey @ 2017-03-03  1:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joey Hess, Junio C Hamano, Jeff King, Ian Jackson,
	Git Mailing List

On Thu, Mar 02, 2017 at 02:27:15PM -0800, Linus Torvalds wrote:
> On Thu, Mar 2, 2017 at 1:54 PM, Joey Hess <id@joeyh.name> wrote:
> >
> > There's a surprising result of combining iterated hash functions, that
> > the combination is no more difficult to attack than the strongest hash
> > function used.
> 
> Duh. I should actually have known that. I started reading the paper
> and went "this seems very familiar". I'm pretty sure I've been pointed
> at that paper before (or maybe just a similar one), and I just didn't
> react enough for it to leave a lasting impact.

What if the "object version" is a hash of the content (as opposed to
header + content like the normal git hash)?

Mike

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-03  1:50                   ` Mike Hommey
@ 2017-03-03  2:19                     ` Linus Torvalds
  0 siblings, 0 replies; 136+ messages in thread
From: Linus Torvalds @ 2017-03-03  2:19 UTC (permalink / raw)
  To: Mike Hommey
  Cc: Joey Hess, Junio C Hamano, Jeff King, Ian Jackson,
	Git Mailing List

On Thu, Mar 2, 2017 at 5:50 PM, Mike Hommey <mh@glandium.org> wrote:
>
> What if the "object version" is a hash of the content (as opposed to
> header + content like the normal git hash)?

It doesn't actually matter for that attack.

The concept of the attack is actually fairly simple: generate a lot of
collisions in the first hash (and they outline how you only need to
generate 't' serial collisions and turn them into 2**t collisions),
and then just check those collisions against the second hash.

If you have enough collisions in the first one, having a collision in
the second one is inevitable just from the birthday rule.

Now, *In*practice* that attack is not very easy to do. Each collision
is still hard to generate. And because of the git object rules (the
size has to match), it limits you a bit in what collisions you
generate.

But the fact that git requires the right header can be considered just
a variation on the initial state to SHA1, and then the additional
requirement might be as easy as just saying that your collision
generation function just always needs to generate a fixed-size block
(which could be just 64 bytes - the SHA1 blocking size).

So assuming you can arbitrarily generate collisions (brute force:
2**80 operations) you could make the rule be that you generate
something that starts with one 64-byte block that matches the git
rules:

   "blob 6454\0"..pad with repeating NUL bytes..

and then you generate 100 pairs of 64-byte SHA1 collisions (where the
first starts with the initial value of that fixed blob prefix, the
next with the state after the first block, etc etc).

Now you can generate 2**100 different sequences that all are exactly
6464 bytes (101 64-byte blocks) and all have the same SHA1 - all all
share that first fixed 64-byte block.

You needed "only" on the order of 100 * 2**80 SHA1 operations to do
that in theory.

An since you have 2**100 objects, you know that you will have a likely
birthday paradox even if your secondary hash is 200 bits long.

So all in all, you generated a collision in on the order of 2**100 operations.

So instead of getting the security of "160+200" bits, you only got 200
bits worth of real collision resistance.

NOTE!! All of the above is very much assuming "brute force". If you
can brute-force the hash, you can completely break any hash. The input
block to SHA1 is 64 bytes, so by definition you have 512 bits of data
to play with, and you're generating a 160-bit output: there is no
question what-so-ever that you couldn't generate any hash you want if
you brute-force things.

The place where things like having a fixed object header can help is
when the attack in question requires some repeated patterns. For
example, if you're not brute-forcing things, your attack on the hash
will likely involve using very particular patterns to change a number
of bits in certain ways, and then combining those particular patterns
to get the hash collision you wanted.  And *that* is when you may need
to add some particular data to the middle to make the end result be a
particular match.

But a brute-force attack definitely doesn't need size changes. You can
make the size be pretty much anything you want (modulo really small
inputs, of course - a one-byte input only has 256 different possible
hashes ;) if you have the resources to just go and try every possible
combination until you get the hash you wanted.

I may have overly simplified the paper, but that's the basic idea.

               Linus

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 19:55         ` Linus Torvalds
  2017-03-02 20:43           ` Junio C Hamano
@ 2017-03-03 11:04           ` Jeff King
  2017-03-03 21:47           ` Stefan Beller
  2 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-03-03 11:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Ian Jackson, Joey Hess, Git Mailing List

On Thu, Mar 02, 2017 at 11:55:45AM -0800, Linus Torvalds wrote:

> Anyway, I do have a suggestion for what the "object version" would be,
> but I'm not even going to mention it, because I want people to first
> think about the _concept_ and not the implementation.
> 
> So: What do you think about the concept?

I think it very much depends on what's in the "object version". :)

IMHO, we are best to consider sha1 "broken" and not count on any of its
bytes for cryptographic integrity. I know that's not really the case,
but it just makes reasoning about the whole thing simpler. So at that
point, it's pretty obvious that the "object version" is really just "an
integrity hash".

And that takes us full circle to earlier proposals over the years to do
something like this in the commit header:

  parent ...some sha1...
  parent-sha256 ...some sha256...

and ditto in tag headers, and trees obviously need to be hackily
extended as you described to carry the extra hash. And then internally
we continue to happily use sha1s, except you can check the
sha256-validity of any reference if you feel like it.

This is functionally equivalent to "just start using sha-256, but keep a
mapping of old sha1s to sha-256s to handle old references". The
advantage is that it makes the code part of the transition simpler. The
disadvantage is that you're effectively carrying a piece of that
sha1->sha256 mapping around in _every_ object.

And that means the same bits of mapping data are repeated over and over.
Git's pretty good at de-duplicating on the surface. So yeah, every tree
entry is now 256 bits larger, but deltas mean that we usually only end
up storing each entry a handful of times. But we still pay the price to
walk over the bytes every time we apply a delta, zlib inflate, parse the
tree, etc. The runtime cost of the transition is carried forward
forever, even for repositories that are willing to rewrite history, or
are created after the flag day.

So I dunno. Maybe I am missing something really clever about your
proposal. Reading the rest of the thread, it sounds like you had a
thought that we could get by with a very tiny object version, but the
hash-adding thing nixed that. If I'm still missing the point, please try
to sketch it out a bit more concretely, and I'll come back with my
thinking cap on.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 21:46               ` Brandon Williams
@ 2017-03-03 11:13                 ` Jeff King
  2017-03-03 14:54                   ` Ian Jackson
  0 siblings, 1 reply; 136+ messages in thread
From: Jeff King @ 2017-03-03 11:13 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jason Cooper, Junio C Hamano, Linus Torvalds, Ian Jackson,
	Joey Hess, Git Mailing List

On Thu, Mar 02, 2017 at 01:46:10PM -0800, Brandon Williams wrote:

> There were a few of us discussing this sort of approach internally.  We
> also figured that, given some performance hit, you could maintain your
> repo in sha256 and do some translation to sha1 if you need to push or
> fetch to a server which has the the repo in a sha1 format.  This way you
> can convert your repo independently of the rest of the world.

Yeah, you definitely _can_ convert between the two. It's just expensive
to do so on the fly. We'd potentially want to be able to during the
transition period just to help people get all the work converted over.
But I had assumed that the conversion would be a mix of:

  1. Unpublished work (or work which is otherwise OK to be rewritten)
     could be converted to the new hash.

  2. Old history could be grafted with a parent pointer that mentions
     the tip of the old history by its new hash, but the pointed-to
     parent contains sha1s.

> As for storing the translation table, you should really only need to
> maintain the table until old clients are phased out and all of the repos
> of a project have experienced flag day and have been converted to
> sha256.

I think you've read more into my "conversion" than I intended. The old
history won't get rewritten. It will just be grafted onto the bottom of
the commit history you've got, and the new trees will all be written
with the new hash.

So you still have those old objects hanging around that refer to things
by their sha1 (not to mention bug trackers, commit messages, etc, which
all use commit ids). And you want to be able to quickly resolve those
references.

What _does_ get rewritten is what's in your ref files, your pack .idx,
etc. Those are all sha256 (or whatever), and work as sha1's do now.
Looking up a sha1 reference from an old object just goes through the
extra level of indirection.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-03 11:13                 ` Jeff King
@ 2017-03-03 14:54                   ` Ian Jackson
  2017-03-03 22:18                     ` Jeff King
  0 siblings, 1 reply; 136+ messages in thread
From: Ian Jackson @ 2017-03-03 14:54 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Williams, Jason Cooper, Junio C Hamano, Linus Torvalds,
	Joey Hess, Git Mailing List

Jeff King writes ("Re: SHA1 collisions found"):
> I think you've read more into my "conversion" than I intended. The old
> history won't get rewritten. It will just be grafted onto the bottom of
> the commit history you've got, and the new trees will all be written
> with the new hash.
> 
> So you still have those old objects hanging around that refer to things
> by their sha1 (not to mention bug trackers, commit messages, etc, which
> all use commit ids). And you want to be able to quickly resolve those
> references.
> 
> What _does_ get rewritten is what's in your ref files, your pack .idx,
> etc. Those are all sha256 (or whatever), and work as sha1's do now.

This all sounds very similar to my proposal.

> Looking up a sha1 reference from an old object just goes through the
> extra level of indirection.

I don't understand why this is a level of indirection, rather than
simply a retention of the existing SHA-1 object database (in parallel,
but deprecated).

Perhaps I have misunderstood what you mean by "graft".  I assume you
don't mean info/grafts, because that is not conveyed by transfer
protocols.

Stepping back a bit, the key question is what the data structure will
look like after the transition.

Specifically, the parent reference in the first post-transition commit
has to refer to something.  What does it refer to ?  The possibilities
seem to be:

 1a. It names the SHA1 hash of an old commit object
 1b. It names the BLAKE[1] hash of an old commit object, which
    object of course refers to its parents by SHA1.

 2. It names the BLAKE hash of a translated version of the
    old commit object.

 3. It doesn't name the parent, and the old history is not
    automatically transferred by clone and not readily accessible.

(1a) and (1b) are different variants of something like my mixed hash
proposal.  Old and new hashes live side by side.

(2) involves rewriting all of the old history, to recursively generate
new objects (with BLAKE names, and which in turn refer to other
rewritten old objects by BLAKE names).  The first time a particular
tree needs to look up an object by a BLAKE name, it needs to run a
conversion its own entire existing history.

For (2) there would have to be some kind of mapping table in every
tree, which allowed object names to be maped in both directions.  The
object content translation would have to be reversible, so that the
actual pre-translation objects would not need to be stored; rather
they would be reconstructed from the post-translation objects, when
someone asks for a pre-translation object.  In principle it would be
possible to convert future BLAKE commits to SHA-1 ones, again by
recursive rewriting.

I don't think anyone is seriously suggesting (3).

So there is a choice between:

(1) a unified hash tree containing a mixture of different hashes at
different reference points, where each object has one identity and one
name.

(2) parallel hash tree structures, each using only a single hash, with
at least every old object present in both tree structures.

I think (1) is preferable because it provides, to callers of git, the
existing object naming semantics.  Callers need to be taught to accept
an extension to the object name format.  Existing object names stored
elsewhere than in git remain valid.

Conversely, (2) requires many object names stored elsewhere than in
git to be updated.  It's possible with (2) to do ad-hoc lookups on
object names in mailing list messages or commit messages and so on.
Even if it is possible for the new git to answer questions like "does
this new branch with BLAKE hash X' contain the commit with SHA1 hash
Y" by implicitly looking up the corresponding BLAKE commit Y' and
answering the question with reference to Y', this isn't going to help
if external code does things like "have git compute the merge base of
X and Y' and check that it is equal to Z".  Either the external
database's record of Z would have to be changed to refer to Z', or the
external code would have to be taught to apply an object name
normalisation operation to turn Z into Z' each time.

Also, (2) has trouble with shallow clones.  This is because it's not
possible to translate old objects to new ones without descending to
the roots of the object tree and recursively translating everything
(or looking up the answer of a previous translation).

Then there is the question of naming syntax.

The distinction between (1) single unified mixed hash tree, and
(2) multiple parallel homogenous hash trees, is mostly orthogonal to
the command-line (and in-commit-object etc.) representation of new
hashes.

The main thing here is that, regardless of the choice between (1) or
(2), we need to choose whether object names specified on the git
command line, and printed by normal git commands, explicitly identify
the hash function.

I think there are roughly three options:

 (A) Decorate all new hashes with a hash function indication
     (sha256:<hex> or blake_<hex> or H<hex>)

 (B) Infer the hash function from the object name length
     (and do some kind of bodge for abbreviated object names).

 (C) Hash function is implicit from context.  (This is compatible with
     (2) only, because (1) requires any object to be able to refer to
     any hash function.)

I think (A) is best because it means everything is unambiguous, and it
allows future hash function changes without further difficulty.

(B) is a reasonable possibility although the abbreviated object name
bodge would be quite ugly and probably involve thinking about several
annoying edge cases.

I think (C) is really bad, because it instantly makes all existing
application code which calls git to be buggy.  Such application code
would need to be adjusted to know for itself which of the object names
it has recorded are what hash function, and explicitly specify this to
its git operations somehow.

All of these options involve updating many callers of git.  In any
case any git caller which explicitly checks the object name length
will need to be changed.  For (a), many git callers which match object
names using something like [0-9a-f]+ rather than \w+ will need to be
changed - but at least it's a simple change with little semantic
import.

(A) has the additional advantage that it becomes possible to make
object names syntactically distinguishable from ref names.

The final argument I would make is this:

We don't know what hash function research will look like in 10-20
years.  We would like to not have a bunch of pain again.  So ideally
we would deploy a framework now that would let us switch hash function
again without further history-rewriting.

(1)(A) and perhaps (1)(B) are the only options which support this
well.

Ian.

[1] I'm going to keep assuming that the bikeshed will be blue, because
I think BLAKE2b has is a better choice.  It has probably had more
serious people looking at it than SHA-3, at least, and it has good
performance.  The web page has an impressive adoption list - probably
wider than SHA-3.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-02 19:55         ` Linus Torvalds
  2017-03-02 20:43           ` Junio C Hamano
  2017-03-03 11:04           ` Jeff King
@ 2017-03-03 21:47           ` Stefan Beller
  2 siblings, 0 replies; 136+ messages in thread
From: Stefan Beller @ 2017-03-03 21:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Junio C Hamano, Ian Jackson, Joey Hess,
	Git Mailing List

On Thu, Mar 2, 2017 at 11:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So: What do you think about the concept?
>
>                Linus

One of the things I like about working on Git is its pretty
high standard of testing. So we would need to come up with
good methods of testing this, e.g. when
GIT_TEST_WITH_DEGENERATE_HASH is set, we'd use
an intentionally weak hashing function and have tests for
the collisions. These tests would need cover most of the
workflows that are currently performed with Git
(local creation, fetching, pushing). Writing all these
additional tests (which consists of creating colliding
objects/commits and then performing all these tests),
sounds about as much work as actually converting to
a new hash function. (First locally and then at a later
point in time all the networking related things).

I would not want to go that way.

Stefan

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: SHA1 collisions found
  2017-03-03 14:54                   ` Ian Jackson
@ 2017-03-03 22:18                     ` Jeff King
  0 siblings, 0 replies; 136+ messages in thread
From: Jeff King @ 2017-03-03 22:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Brandon Williams, Jason Cooper, Junio C Hamano, Linus Torvalds,
	Joey Hess, Git Mailing List

On Fri, Mar 03, 2017 at 02:54:56PM +0000, Ian Jackson wrote:

> > What _does_ get rewritten is what's in your ref files, your pack .idx,
> > etc. Those are all sha256 (or whatever), and work as sha1's do now.
> 
> This all sounds very similar to my proposal.

Yeah, sorry I haven't reviewed that more carefully yet.

> > Looking up a sha1 reference from an old object just goes through the
> > extra level of indirection.
> 
> I don't understand why this is a level of indirection, rather than
> simply a retention of the existing SHA-1 object database (in parallel,
> but deprecated).

I just meant that we will need to know both the sha1 and the sha-256 of
every object (because the same blob, for example, may be referred to by
either name depending on whether it is a new or a historical tree).  So
one way to do that is to have a table mapping sha1 to sha-256, and then
a lookup of sha-1 goes through that before looking up the object content
on disk via sha-256.

But it may also be fine to just keep an index mapping sha1 directly to
object contents. That makes a sha1 lookup slightly faster, but it's more
expensive to do a sha-256 verification (you have to reverse-index the
object location in the sha-256 list).

(And my usual disclaimer: I am using sha-256 as a placeholder; I don't
have a strong opinion on the actual hash choice).

> Perhaps I have misunderstood what you mean by "graft".  I assume you
> don't mean info/grafts, because that is not conveyed by transfer
> protocols.

No, I just mean there will be a spot in the commit graph (or many spots,
potentially) where a "v2" commit using sha-256 references (by sha-256) a
"v1" commit that is full of sha-1s. I say "graft" only to differentiate
it from the idea of rewriting the content of those commit objects.

> Specifically, the parent reference in the first post-transition commit
> has to refer to something.  What does it refer to ?  The possibilities
> seem to be:
> 
>  1a. It names the SHA1 hash of an old commit object
>  1b. It names the BLAKE[1] hash of an old commit object, which
>     object of course refers to its parents by SHA1.
> 
>  2. It names the BLAKE hash of a translated version of the
>     old commit object.
> 
>  3. It doesn't name the parent, and the old history is not
>     automatically transferred by clone and not readily accessible.
> 
> (1a) and (1b) are different variants of something like my mixed hash
> proposal.  Old and new hashes live side by side.

Thanks for laying out those options. My proposals have definitely been
in the (1b) camp.

I think (1a) is not so bad; it just bumps the transition point one
commit higher. But I think the "rules" for which hash to expect are
easier if they depend on which object the _pointer_ is using, rather
than the _pointee_.

> (2) involves rewriting all of the old history, to recursively generate
> new objects (with BLAKE names, and which in turn refer to other
> rewritten old objects by BLAKE names).  The first time a particular
> tree needs to look up an object by a BLAKE name, it needs to run a
> conversion its own entire existing history.
> 
> For (2) there would have to be some kind of mapping table in every
> tree, which allowed object names to be maped in both directions.  The
> object content translation would have to be reversible, so that the
> actual pre-translation objects would not need to be stored; rather
> they would be reconstructed from the post-translation objects, when
> someone asks for a pre-translation object.  In principle it would be
> possible to convert future BLAKE commits to SHA-1 ones, again by
> recursive rewriting.

Hmm. I had initially rejected this as being pretty nasty for accessing
the old format on the fly. But as long as you keep the bidirectional
mapping from the initial expensive conversion, in most cases you only
need to convert at most a single object. E.g., the two cases that are
really interesting are:

  - I have an old commit sha1 and I want to run "git show" on it. We
    convert the sha1 to the BLAKE name, and then just show the BLAKE
    contents as usual.

  - I want to verify a commit or tag signature. We need the original
    bytes for this. So we convert the sha1 to the BLAKE name to get the
    BLAKE'd contents. Then we rewrite only the _single_ object,
    converting any of its internal BLAKE references back to sha1s via
    the mapping.

That's more appealing than I had originally given it credit for, because
most of the other code just happily uses the BLAKE name internally. Even
a diff across the conversion boundary works at full speed, because it's
using the same names consistently.

The big downside is that the mapping is more expensive to generate than
a 1b-style mapping. In 1b, you just compute both hashes over all
incoming objects and store the sha1 in a side lookup table. But here you
actually need to _rewrite_ each object to come up with its
alternate-universe sha1. And you need to do it in reverse-graph order,
not the arbitrary order that something like index-pack uses.

I had assumed that local repos would generate these tables themselves,
so to not have to trust any other repos. And the alternate-universe
thing is more to ask those local repos to do. But it would also be
workable to distribute the mapping out-of-band (e.g., via a signed tag).

You don't even really need to trust the mapping that much, if it's just
used for historical name lookups.

> I don't think anyone is seriously suggesting (3).

Yeah, agreed.

> Conversely, (2) requires many object names stored elsewhere than in
> git to be updated.  It's possible with (2) to do ad-hoc lookups on
> object names in mailing list messages or commit messages and so on.
> Even if it is possible for the new git to answer questions like "does
> this new branch with BLAKE hash X' contain the commit with SHA1 hash
> Y" by implicitly looking up the corresponding BLAKE commit Y' and
> answering the question with reference to Y', this isn't going to help
> if external code does things like "have git compute the merge base of
> X and Y' and check that it is equal to Z".  Either the external
> database's record of Z would have to be changed to refer to Z', or the
> external code would have to be taught to apply an object name
> normalisation operation to turn Z into Z' each time.

I think the X->X' conversion on input is easy. The universe of objects
whose sha1 we are about is not getting any bigger. Outputting sha1 Z
instead of BLAKE Z' is a bit harder (and at the very least probably
needs the caller to use a specific command-line option).

> Also, (2) has trouble with shallow clones.  This is because it's not
> possible to translate old objects to new ones without descending to
> the roots of the object tree and recursively translating everything
> (or looking up the answer of a previous translation).

Yes. Though if we distribute a partial sha1/BLAKE mapping with the clone
(i.e., just for the objects we're sending) then the client could still
use sha1 names. It not be that big a problem in practice, though. If
you're shallow, then you don't _have_ the old object names to refer to
anyway.

> The main thing here is that, regardless of the choice between (1) or
> (2), we need to choose whether object names specified on the git
> command line, and printed by normal git commands, explicitly identify
> the hash function.
> 
> I think there are roughly three options:
> 
>  (A) Decorate all new hashes with a hash function indication
>      (sha256:<hex> or blake_<hex> or H<hex>)
> 
>  (B) Infer the hash function from the object name length
>      (and do some kind of bodge for abbreviated object names).
> 
>  (C) Hash function is implicit from context.  (This is compatible with
>      (2) only, because (1) requires any object to be able to refer to
>      any hash function.)
> 
> I think (A) is best because it means everything is unambiguous, and it
> allows future hash function changes without further difficulty.

For input, I think we should definitely _support_ A, but in practice I
think people would be happy if an undecorated hash (or partial hash)
looks it up as a BLAKE name, and then falls back to the sha1.

In (2) this is obviously the right thing to do, because all of our
output will be BLAKE names.

In (1) it is less clear if we might output sha1 names for old cases. But
I think we're still better off using the stronger hash when possible.

> (A) has the additional advantage that it becomes possible to make
> object names syntactically distinguishable from ref names.

Sort of. Our get_sha1() parser accepts a lot of random syntax.  E.g.,
"sha256:1234abcd" is ambiguous with a file inside the tree named by
"sha256". In practice I don't mind carving out a namespace and letting
people with a branch named "sha256" rot.

> We don't know what hash function research will look like in 10-20
> years.  We would like to not have a bunch of pain again.  So ideally
> we would deploy a framework now that would let us switch hash function
> again without further history-rewriting.
> 
> (1)(A) and perhaps (1)(B) are the only options which support this
> well.

Yes, I think planning for another migration is a sensible thing.

I just think we should not sacrifice any other properties to the idea
that people could flip on their bespoke hashes and interoperate with
other users. I.e., "git config core.hash sha3 && git push" should not be
a use case we care about at all, because it creates all sorts of _other_
headaches.

But I have no objection to making the 20-years-from-now migration less
painful, and I agree that (1b) is more along those lines.

-Peff

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-03-02 18:13                                 ` Ian Jackson
@ 2017-03-04 22:49                                   ` brian m. carlson
  2017-03-05 13:45                                     ` Ian Jackson
  0 siblings, 1 reply; 136+ messages in thread
From: brian m. carlson @ 2017-03-04 22:49 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	Jason Cooper, ankostis, Junio C Hamano, Git Mailing List,
	Stefan Beller, David Lang, Joey Hess

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

On Thu, Mar 02, 2017 at 06:13:27PM +0000, Ian Jackson wrote:
> brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> > On Mon, Feb 27, 2017 at 01:00:01PM +0000, Ian Jackson wrote:
> > > Objects of one hash may refer to objects named by a different hash
> > > function to their own.  Preference rules arrange that normally, new
> > > hash objects refer to other new hash objects.
> > 
> > The existing codebase isn't really intended with that in mind.
> 
> Yes.  I've seen the attempts to start to replace char* with a hash
> struct.

My comment actually has nothing to do with the way struct object_id is
set up.  That actually can be trivially extended with a byte or two of
type.

Instead, I was referring to areas like the notes code.  It has extensive
use of the last byte as a type of lookup table key.  It's very dependent
on having exactly one hash, since it will always want to use the last
byte.

There are other, more subtle areas of the code that just don't handle
multiple hashes well.  Ideally we would remedy this, but I think
everyone is very eager to move away from SHA-1, and since nobody has
stepped up to volunteer to do that work, we should probably adopt a
solution that doesn't involve doing that.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-03-04 22:49                                   ` brian m. carlson
@ 2017-03-05 13:45                                     ` Ian Jackson
  2017-03-05 23:45                                       ` brian m. carlson
  0 siblings, 1 reply; 136+ messages in thread
From: Ian Jackson @ 2017-03-05 13:45 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	Jason Cooper, ankostis, Junio C Hamano, Git Mailing List,
	Stefan Beller, David Lang, Joey Hess

brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> Instead, I was referring to areas like the notes code.  It has extensive
> use of the last byte as a type of lookup table key.  It's very dependent
> on having exactly one hash, since it will always want to use the last
> byte.

You mean note_tree_search ?  (My tree here may be a bit out of date.)
This doesn't seem difficult to fix.  The nontrivial changes would be
mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.

It's true that like most of git there's a lot of hardcoded `sha1'.

Are you arguing in favour of "replace git with git2 by simply
s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
Takeup of the new `git2' would be very slow because of the pain
involved.

Any sensible method of moving to a new hash that isn't "make a
completely incompatible new version of git" is going to involve
teaching the code we have in git right now to handle new hashes as
well as sha1 hashes.

Even if the plan is to try to convert old data, rather than keep it
and be able to refer to it from new data, something will have to be
able to parse old packfiles, old commits, old tags, old notes,
etc. etc. etc.  Either that's going to be some separate conversion
utility, or it has to be the same code in git that's there already.[1]

The ability to handle both old-format and new-format data can be
achieved in the code by doing away with the hardcoded sha1s, so that
instead the hash is an abstract data type with operations like
"initialise", "compare", "get a nybble", etc.  We've already seen
patches going in this direction.

[1] I've heard suggestions here that instead we should expect users to
"git1 fast-export", which you would presumably feed into "git2
fast-import".  But what is `git1' here ?  Is it the current git
codebase frozen in time ?  I don't think it can be.  With this
conversion strategy, we will need to maintain git1 for decades.  It
will need portability fixes, security fixes, fixes for new hostile
compiler optimisations, and so on.  The difficulty of conversion means
there will be pressure to backport new features from `git2' to `git1'.
(Also this approach means that all signatures are definitively lost
during the conversion process.)

So if we want to provide both `git1' and `git2', it's still better to
compile `git' and `git2' from the same codebase.  But if we do that,
the resulting ifdeffery and/or other hash abstractions are most of the
work to be hash-agile.  It's just the difference between a
compile-time and runtime switch.

I think the incompatibile approach is much more work in the medium and
long term - and it leads to a longer transition period.

Bear in mind that our objective is not to minimise the time until the
new version of git is available.  Our objective is to minimise the
time until (most) people are using it.  An approach which takes longer
for the git community to develop, but which is easier to deploy, can
easily be better.

Or maybe the objective is to minimise overall effort.  In which case
more work on git, for an easier transition for all the users, seems
like a no-brainer.  I think this is arguably true even from the point
of view of effort amongst the community of git contributors.  git
contributors start out as git users - and if git's users are all busy
struggling with a difficult transition, they will have less time to
improve other stuff and will tend less to get involved upstream.  (And
they may be less inclined to feel that the git upstream developers
understand their needs well.)

The better alternative is to adopt a plan that has a clear and
straightforward transition for users, and ask git users to help with
implementation.

I think many git users, including sophisticated users and competent
organisations, are concerned about sha1.  Currently most of those
users will find it difficult to help, because it's not clear to them
what needs to be done.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 136+ messages in thread

* Re: Transition plan for git to move to a new hash function
  2017-03-05 13:45                                     ` Ian Jackson
@ 2017-03-05 23:45                                       ` brian m. carlson
  0 siblings, 0 replies; 136+ messages in thread
From: brian m. carlson @ 2017-03-05 23:45 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Linus Torvalds,
	Jason Cooper, ankostis, Junio C Hamano, Git Mailing List,
	Stefan Beller, David Lang, Joey Hess

[-- Attachment #1: Type: text/plain, Size: 3320 bytes --]

On Sun, Mar 05, 2017 at 01:45:46PM +0000, Ian Jackson wrote:
> brian m. carlson writes ("Re: Transition plan for git to move to a new hash function"):
> > Instead, I was referring to areas like the notes code.  It has extensive
> > use of the last byte as a type of lookup table key.  It's very dependent
> > on having exactly one hash, since it will always want to use the last
> > byte.
> 
> You mean note_tree_search ?  (My tree here may be a bit out of date.)
> This doesn't seem difficult to fix.  The nontrivial changes would be
> mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.
> 
> It's true that like most of git there's a lot of hardcoded `sha1'.

I'm talking about the entire notes.c file.  There are several different
uses of "19" in there, and they compose at least two separate concepts.
My object-id-part9 series tries to split those out into logical
constants.

This code is not going to handle repositories with different-length
objects well, which I believe was your initial proposal.  I originally
thought that mixed-hash repositories would be viable as well, but I no
longer do.

> Are you arguing in favour of "replace git with git2 by simply
> s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
> Takeup of the new `git2' would be very slow because of the pain
> involved.

I'm arguing that the same binary ought to be able to handle both SHA-1
and the new hash.  I'm also arguing that a given object have exactly one
hash and that we not mix hashes in the same object.  A repository will
be composed of one type of object, and if that's the new hash, a lookup
table will be used to translate SHA-1.  We can synthesize the old
objects, should we need them.

That allows people to use the SHA-1 hashes (in my view, with a prefix,
such as "sha1:") in repositories using the new hash.  It also allows
verifying old tags and commits if need be.

What I *would* like to see is an extension to the tag and commit objects
which names the hash that was used to make them.  That makes it easy to
determine which object the signature should be verified over, as it will
verify over only one of them.

> [1] I've heard suggestions here that instead we should expect users to
> "git1 fast-export", which you would presumably feed into "git2
> fast-import".  But what is `git1' here ?  Is it the current git
> codebase frozen in time ?  I don't think it can be.  With this
> conversion strategy, we will need to maintain git1 for decades.  It
> will need portability fixes, security fixes, fixes for new hostile
> compiler optimisations, and so on.  The difficulty of conversion means
> there will be pressure to backport new features from `git2' to `git1'.
> (Also this approach means that all signatures are definitively lost
> during the conversion process.)

I'm proposing we have a git hash-convert (the name doesn't matter that
much) that converts in place.  It rebuilds the objects and builds a
lookup table.  Since the contents of git objects are deterministic, this
makes it possible for each individual user to make the transition in
place.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 136+ messages in thread

end of thread, other threads:[~2017-03-05 23:45 UTC | newest]

Thread overview: 136+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-23 16:43 SHA1 collisions found Joey Hess
2017-02-23 17:00 ` David Lang
2017-02-23 17:02 ` Junio C Hamano
2017-02-23 17:12   ` David Lang
2017-02-23 20:49     ` Jakub Narębski
2017-02-23 20:57       ` Jeff King
2017-02-23 17:18   ` Junio C Hamano
2017-02-23 17:35   ` Joey Hess
2017-02-23 17:52     ` Linus Torvalds
2017-02-23 18:21       ` Joey Hess
2017-02-23 18:31         ` Joey Hess
2017-02-23 19:13           ` Morten Welinder
2017-02-24 15:52             ` Geert Uytterhoeven
2017-02-23 18:40         ` Linus Torvalds
2017-02-23 18:46           ` Jeff King
2017-02-23 19:09             ` Linus Torvalds
2017-02-23 19:32               ` Jeff King
2017-02-23 19:47                 ` Linus Torvalds
2017-02-23 19:57                   ` Jeff King
     [not found]                     ` <alpine.LFD.2.20.1702231428540.30435@i7.lan>
2017-02-23 22:43                       ` Jeff King
2017-02-23 22:50                         ` Linus Torvalds
2017-02-23 23:05                         ` Jeff King
2017-02-23 23:05                           ` [PATCH 1/3] add collision-detecting sha1 implementation Jeff King
2017-02-23 23:15                             ` Stefan Beller
2017-02-24  0:01                               ` Jeff King
2017-02-24  0:12                                 ` Linus Torvalds
2017-02-24  0:16                                   ` Jeff King
2017-02-23 23:05                           ` [PATCH 2/3] sha1dc: adjust header includes for git Jeff King
2017-02-23 23:06                           ` [PATCH 3/3] Makefile: add USE_SHA1DC knob Jeff King
2017-02-24 18:36                             ` HW42
2017-02-24 18:57                               ` Jeff King
2017-02-23 23:14                           ` SHA1 collisions found Linus Torvalds
2017-02-28 18:41                           ` Junio C Hamano
2017-02-28 19:07                             ` Junio C Hamano
2017-02-28 19:20                               ` Jeff King
2017-03-01  8:57                                 ` Dan Shumow
2017-02-28 19:34                               ` Linus Torvalds
2017-02-28 19:52                                 ` Shawn Pearce
2017-02-28 22:56                                   ` Linus Torvalds
2017-02-28 21:22                                 ` Dan Shumow
2017-02-28 22:50                                   ` Marc Stevens
2017-02-28 23:11                                     ` Linus Torvalds
2017-03-01 19:05                                       ` Jeff King
2017-02-23 20:47               ` Øyvind A. Holm
2017-02-23 20:46             ` Joey Hess
2017-02-23 18:42         ` Jeff King
2017-02-23 17:52     ` David Lang
2017-02-23 19:20   ` David Lang
2017-02-23 17:19 ` Linus Torvalds
2017-02-23 17:29   ` Linus Torvalds
2017-02-23 18:10   ` Joey Hess
2017-02-23 18:29     ` Linus Torvalds
2017-02-23 18:38     ` Junio C Hamano
2017-02-24  9:42 ` Duy Nguyen
2017-02-25 19:04   ` brian m. carlson
2017-02-27 13:29     ` René Scharfe
2017-02-28 13:25       ` brian m. carlson
2017-02-24 15:13 ` Ian Jackson
2017-02-24 17:04   ` ankostis
2017-02-24 17:23   ` Jason Cooper
2017-02-25 23:22     ` ankostis
2017-02-24 17:32   ` Junio C Hamano
2017-02-24 17:45     ` David Lang
2017-02-24 18:14       ` Junio C Hamano
2017-02-24 18:58         ` Stefan Beller
2017-02-24 19:20           ` Junio C Hamano
2017-02-24 20:05             ` ankostis
2017-02-24 20:32               ` Junio C Hamano
2017-02-25  0:31                 ` ankostis
2017-02-26  0:16                   ` Jason Cooper
2017-02-26 17:38                     ` brian m. carlson
2017-02-26 19:11                       ` Linus Torvalds
2017-02-26 21:38                         ` Ævar Arnfjörð Bjarmason
2017-02-26 21:52                           ` Jeff King
2017-02-27 13:00                             ` Transition plan for git to move to a new hash function Ian Jackson
2017-02-27 14:37                               ` Why BLAKE2? Markus Trippelsdorf
2017-02-27 15:42                                 ` Ian Jackson
2017-02-27 19:26                               ` Transition plan for git to move to a new hash function Tony Finch
2017-02-28 21:47                               ` brian m. carlson
2017-03-02 18:13                                 ` Ian Jackson
2017-03-04 22:49                                   ` brian m. carlson
2017-03-05 13:45                                     ` Ian Jackson
2017-03-05 23:45                                       ` brian m. carlson
2017-02-24 20:05             ` SHA1 collisions found Junio C Hamano
2017-02-24 20:33           ` Philip Oakley
2017-02-24 23:39     ` Jeff King
2017-02-25  0:39       ` Linus Torvalds
2017-02-25  0:54         ` Linus Torvalds
2017-02-25  1:16         ` Jeff King
2017-02-26 18:55           ` Junio C Hamano
2017-02-25  6:10         ` Junio C Hamano
2017-02-26  1:13           ` Jason Cooper
2017-02-26  5:18             ` Jeff King
2017-02-26 18:30               ` brian m. carlson
2017-03-02 21:46               ` Brandon Williams
2017-03-03 11:13                 ` Jeff King
2017-03-03 14:54                   ` Ian Jackson
2017-03-03 22:18                     ` Jeff King
2017-03-02 19:55         ` Linus Torvalds
2017-03-02 20:43           ` Junio C Hamano
2017-03-02 21:21             ` Linus Torvalds
2017-03-02 21:54               ` Joey Hess
2017-03-02 22:27                 ` Linus Torvalds
2017-03-03  1:50                   ` Mike Hommey
2017-03-03  2:19                     ` Linus Torvalds
2017-03-03 11:04           ` Jeff King
2017-03-03 21:47           ` Stefan Beller
2017-02-25  1:00       ` David Lang
2017-02-25  1:15         ` Stefan Beller
2017-02-25  1:21         ` Jeff King
2017-02-25  1:39           ` David Lang
2017-02-25  1:47             ` Jeff King
2017-02-25  1:56               ` David Lang
2017-02-25  2:28             ` Jacob Keller
2017-02-25  2:26           ` Jacob Keller
2017-02-25  5:39             ` grarpamp
2017-02-24 23:43     ` Ian Jackson
2017-02-25  0:06       ` Ian Jackson
2017-02-25 18:50     ` brian m. carlson
2017-02-25 19:26       ` Jeff King
2017-02-25 22:09         ` Mike Hommey
2017-02-26 17:38           ` brian m. carlson
2017-02-24 22:47 ` Jakub Narębski
2017-02-24 22:53   ` Santiago Torres
2017-02-24 23:05     ` Jakub Narębski
2017-02-24 23:24       ` Øyvind A. Holm
2017-02-24 23:06   ` Jeff King
2017-02-24 23:35     ` Jakub Narębski
2017-02-25 22:35     ` Lars Schneider
2017-02-26  0:46       ` Jeff King
2017-02-26 18:22         ` Junio C Hamano
2017-02-26 18:57     ` Thomas Braun
2017-02-26 21:30       ` Jeff King
2017-02-27  9:57         ` Geert Uytterhoeven
2017-02-27 10:43           ` Jeff King
2017-02-27 12:39             ` Morten Welinder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).