Re: Val Henson's critique of hash-based content storage systems

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Linus Torvalds <torvalds@osdl.org>
To: Rob Jellinghaus <robj@unrealities.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Val Henson's critique of hash-based content storage systems
Date: Fri, 29 Apr 2005 12:45:07 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org> (raw)
In-Reply-To: <loom.20050429T015434-928@post.gmane.org>

On Fri, 29 Apr 2005, Rob Jellinghaus wrote:
> 
> If an attacker used an SHA-1 attack to create a blob that matched the hash of
> some well-known git object (say, the tree for Linux 2.7-rc1), and spammed public
> git repositories with it ahead of Linus's release, what would be the potential
> for mischief, and what would the recovery process be?

I really think people should not consider the sha1 the "security". 

The real security is in distribution. 

With the distributed setup, developers don't use public trees. They use 
their own _private_ trees, and the public ones are just staging areas for 
synchronization.

So in order to actually replace a blob, let's say that you can create an 
object with the right sha1 trivially. What then?

You now have to break into _every_ repository that has that object, and 
replace it silently. Because if you don't, the good one will still be 
around.

That's just not going to happen.

So let's say that you break into kernel.org, and replace one of the blobs
in my repository.  What happens?

First off, I'll never notice, because it's not actually my repository, so 
I won't even have the corrupt copy. So what _will_ happen?

What will happen is that people who download new stuff from kernel.org
will get the "evil" object. Not all of them, though - just the ones that
hadn't downloaded the proper one. So first off, in order to be really
_effective_, the attack really has to not just replace an object, it
really wants to replace a pretty _recent_ object, because replacing an old
just just doesn't do a whole lot.

So they get the evil object. What happens? NOTHING. Absolutely nada.  
Either they use that evil object, or they don't. Not using it might be
because it's not even top-of-tree any more, and you really just replaced
some old version of a file. Or it might be because it's a object for a
driver that you don't have, so you'd never see it.

So let's ignore that case, and say that the attacker has successfully
replaced an object that is (a) recent enough to matter and (b) actually
used.

What now? You'll get a compile error. Big deal. People will notice that
something is wrong, complain about it, we'll think they have disk
corruption for a while, and then we'll figure it out, and replace the
object. Done.

Why? Because even if you successfully find an object with the same SHA1, 
the likelihood that that object actually makes _sense_ in that conctext is 
pretty damn near zero. 

Think about it. We've had this before: people whose files got flipped
around due to driver bugs or just hardware problems, and even just a
single bit error most of the time results in real honest-to-God compiler
errors.

And because we found the bad one, and we have the good one somewhere else, 
who cares? The security industry will be all atwitter about somebody 
finding a matching SHA1 object, and it will be _huge_ news, but did it 
actually hurt the kernel integrity? No.

So let's say that somebody breaks in to _my_ personal machine. I'm behind 
a few firewalls and a NAT setup, and I don't accept even incoming ssh, but 
hey, they could crowbar my door and break in that way. 

ONLY A TOTAL IDIOT would then replace an object in my database with
something else. That would be _stupid_. He'd just guarantee that all the 
same problems as above were true, except now we'd have to find the 
good object in some _other_ database than mine.

So if you actually wanted to corrupt the kernel tree, you'd do it by just
fooling me into accepting a crap patch. Hey, it happens all the time.  
People send me buggy stuff. We figure out the bugs. What's so different
here?

In other words, the security isn't in the hash. The hash is an added level 
to make it much harder to fool, but it's not "the security". 

And if we are really really unlucky, and a meteorite hits us, and we get
an object collision that has the same sha1 for _real_, and actually makes
sense, then hey, shit happens. We can fix it by "poisoning" that sha1, and
modifying both files trivially so that they don't match any more, and then
we add a list of "illegal" sha1's to fsck, and we'll make that list be ten
entries long, just in case the meteorite strikes ten times, but the fact
is it's simply not going to happen.

(It's going to be very very obvious, very very quickly, btw: the person
who actually created the object that happened to collide will not write
the new SHA1 out, because he already "had" the same object, so next time
somebody updates the tree, the file that matches will now have the "old
contents" from some other colliding file, and the new code simply won't do
what it was supposed to. So don't worry about it - collisions, even if
they happen, will be noticed as quite obvious _bugs_ in the end result,
the same way we find the common source of bugs - bad programming).

In other words: don't depend on hashes if you only have one copy of the
data. But if you have backups of old versions (which essentially the
distribution guarantees as long as we have "stupid" mirrors that just look
at the filename) having a hash collision doesn't mean that you lost any
real data.

So anybody who thinks that a hash collision is a fundamental problem just
hasn't thought things through. It's an _annoyance_, nothing more. But we
have tons of much more pressing annoyances, and pretty much all of them
are a hell of a lot more likely than a collission, whether intentional or
unintentional.

			Linus

next prev parent reply	other threads:[~2005-04-29 19:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-29  0:06 Val Henson's critique of hash-based content storage systems Rob Jellinghaus
2005-04-29 19:45 ` Linus Torvalds [this message]
2005-04-29 19:52   ` Tom Lord
2005-04-29 20:17     ` C. Scott Ananian
2005-04-29 20:37       ` Tom Lord
2005-04-29 20:41         ` C. Scott Ananian
2005-04-29 20:14 ` H. Peter Anvin
2005-04-29 20:47 ` Morten Welinder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=robj@unrealities.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).