* Val Henson's critique of hash-based content storage systems @ 2005-04-29 0:06 Rob Jellinghaus 2005-04-29 19:45 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Rob Jellinghaus @ 2005-04-29 0:06 UTC (permalink / raw) To: git I assume most people here have read this, but just in case: http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson.pdf Is git vulnerable to attacks in the event that SHA-1 is broken? If an attacker used an SHA-1 attack to create a blob that matched the hash of some well-known git object (say, the tree for Linux 2.7-rc1), and spammed public git repositories with it ahead of Linus's release, what would be the potential for mischief, and what would the recovery process be? It seems that git is optimized to support networks of trust, so provided you accept only signed commits from people you trust, it's likely that corruption and mischief can be mostly avoided. But probably not completely; there is still a window of vulnerability. It seems that git repositories could (at great expense) be regenerated to use a new hash algorithm. Is that the plan if SHA-1 is compromised (or comes so close to compromise as to make Linus nervous ;-)? Cheers, Rob ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 0:06 Val Henson's critique of hash-based content storage systems Rob Jellinghaus @ 2005-04-29 19:45 ` Linus Torvalds 2005-04-29 19:52 ` Tom Lord 2005-04-29 20:14 ` H. Peter Anvin 2005-04-29 20:47 ` Morten Welinder 2 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2005-04-29 19:45 UTC (permalink / raw) To: Rob Jellinghaus; +Cc: Git Mailing List On Fri, 29 Apr 2005, Rob Jellinghaus wrote: > > If an attacker used an SHA-1 attack to create a blob that matched the hash of > some well-known git object (say, the tree for Linux 2.7-rc1), and spammed public > git repositories with it ahead of Linus's release, what would be the potential > for mischief, and what would the recovery process be? I really think people should not consider the sha1 the "security". The real security is in distribution. With the distributed setup, developers don't use public trees. They use their own _private_ trees, and the public ones are just staging areas for synchronization. So in order to actually replace a blob, let's say that you can create an object with the right sha1 trivially. What then? You now have to break into _every_ repository that has that object, and replace it silently. Because if you don't, the good one will still be around. That's just not going to happen. So let's say that you break into kernel.org, and replace one of the blobs in my repository. What happens? First off, I'll never notice, because it's not actually my repository, so I won't even have the corrupt copy. So what _will_ happen? What will happen is that people who download new stuff from kernel.org will get the "evil" object. Not all of them, though - just the ones that hadn't downloaded the proper one. So first off, in order to be really _effective_, the attack really has to not just replace an object, it really wants to replace a pretty _recent_ object, because replacing an old just just doesn't do a whole lot. So they get the evil object. What happens? NOTHING. Absolutely nada. Either they use that evil object, or they don't. Not using it might be because it's not even top-of-tree any more, and you really just replaced some old version of a file. Or it might be because it's a object for a driver that you don't have, so you'd never see it. So let's ignore that case, and say that the attacker has successfully replaced an object that is (a) recent enough to matter and (b) actually used. What now? You'll get a compile error. Big deal. People will notice that something is wrong, complain about it, we'll think they have disk corruption for a while, and then we'll figure it out, and replace the object. Done. Why? Because even if you successfully find an object with the same SHA1, the likelihood that that object actually makes _sense_ in that conctext is pretty damn near zero. Think about it. We've had this before: people whose files got flipped around due to driver bugs or just hardware problems, and even just a single bit error most of the time results in real honest-to-God compiler errors. And because we found the bad one, and we have the good one somewhere else, who cares? The security industry will be all atwitter about somebody finding a matching SHA1 object, and it will be _huge_ news, but did it actually hurt the kernel integrity? No. So let's say that somebody breaks in to _my_ personal machine. I'm behind a few firewalls and a NAT setup, and I don't accept even incoming ssh, but hey, they could crowbar my door and break in that way. ONLY A TOTAL IDIOT would then replace an object in my database with something else. That would be _stupid_. He'd just guarantee that all the same problems as above were true, except now we'd have to find the good object in some _other_ database than mine. So if you actually wanted to corrupt the kernel tree, you'd do it by just fooling me into accepting a crap patch. Hey, it happens all the time. People send me buggy stuff. We figure out the bugs. What's so different here? In other words, the security isn't in the hash. The hash is an added level to make it much harder to fool, but it's not "the security". And if we are really really unlucky, and a meteorite hits us, and we get an object collision that has the same sha1 for _real_, and actually makes sense, then hey, shit happens. We can fix it by "poisoning" that sha1, and modifying both files trivially so that they don't match any more, and then we add a list of "illegal" sha1's to fsck, and we'll make that list be ten entries long, just in case the meteorite strikes ten times, but the fact is it's simply not going to happen. (It's going to be very very obvious, very very quickly, btw: the person who actually created the object that happened to collide will not write the new SHA1 out, because he already "had" the same object, so next time somebody updates the tree, the file that matches will now have the "old contents" from some other colliding file, and the new code simply won't do what it was supposed to. So don't worry about it - collisions, even if they happen, will be noticed as quite obvious _bugs_ in the end result, the same way we find the common source of bugs - bad programming). In other words: don't depend on hashes if you only have one copy of the data. But if you have backups of old versions (which essentially the distribution guarantees as long as we have "stupid" mirrors that just look at the filename) having a hash collision doesn't mean that you lost any real data. So anybody who thinks that a hash collision is a fundamental problem just hasn't thought things through. It's an _annoyance_, nothing more. But we have tons of much more pressing annoyances, and pretty much all of them are a hell of a lot more likely than a collission, whether intentional or unintentional. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 19:45 ` Linus Torvalds @ 2005-04-29 19:52 ` Tom Lord 2005-04-29 20:17 ` C. Scott Ananian 0 siblings, 1 reply; 8+ messages in thread From: Tom Lord @ 2005-04-29 19:52 UTC (permalink / raw) To: git; +Cc: robj I wouldn't expect outright successful attacks like forged replacements for arbitrary files. I would expect someone to have on hand a small number of blobs that are different but have different hashes and, eventually, to drop said files into a blob-based infrastructure to wreak havoc. So: a way to locally mark a given checksum as "controversial" seems prudent, to me (hence, support for such in my blob-db code/spec). -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 19:52 ` Tom Lord @ 2005-04-29 20:17 ` C. Scott Ananian 2005-04-29 20:37 ` Tom Lord 0 siblings, 1 reply; 8+ messages in thread From: C. Scott Ananian @ 2005-04-29 20:17 UTC (permalink / raw) To: Tom Lord; +Cc: git, robj On Fri, 29 Apr 2005, Tom Lord wrote: > I would expect someone to have on hand a small number of blobs that are > different but have different hashes and, eventually, to drop said files > into a blob-based infrastructure to wreak havoc. This is just ridiculous. The number of known collisions in SHA1 is *exactly zero* at this point in time --- not guaranteed to stay that way, of course, but generating collisions is likely to remain relatively expensive for some time. The collisions are highly structured; they are not just arbitrary blobs. If, after doing your 2^69 work or so to generate a real honest-to-goodness SHA-1 collision, you think an attacker would "DROP THEM IN A REPOSITORY TO CREATE HAVOC"? You'd have to break into the repository, etc, and then you'd find that *NOTHING REFERENCED THEM* and so *ABSOLUTELY NOTHING WOULD HAPPEN*. It's far more likely that SHA1 collisions will be used to generate forged X509 certificates, for a number of highly technical reasons. Git's highly constrained and derided 'brittle' file formats also serve to protect against the collision attacks against SHA-1 which are beginning to look possible. > So: a way to locally mark a given checksum as "controversial" seems > prudent, to me (hence, support for such in my blob-db code/spec). Arguably that's what *upgrades* to the spec might be for -- git has a solid philosophy of not creating 'features' unless it is sure that they are needed/will be used, and I think this is always the wise route in software development. Of much specification comes no code. And, if you actually create a 'flexible' blob-db spec with 'room for expansion' -- congratulations, you've just made yourself more vulnerable to collision attacks. --scott terrorist MI5 SKILLET hack AMLASH security KMPLEBE KUFIRE SCRANTON D5 SLBM LINCOLN KUDESK SMOTH Kojarena Moscow HTAUTOMAT WSBURNT Chechnya ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 20:17 ` C. Scott Ananian @ 2005-04-29 20:37 ` Tom Lord 2005-04-29 20:41 ` C. Scott Ananian 0 siblings, 1 reply; 8+ messages in thread From: Tom Lord @ 2005-04-29 20:37 UTC (permalink / raw) To: cscott; +Cc: git, robj lord: > I would expect someone to have on hand a small number of blobs that are > different but have different hashes and, eventually, to drop said files > into a blob-based infrastructure to wreak havoc. cscott: This is just ridiculous. The number of known collisions in SHA1 is *exactly zero* at this point in time --- not guaranteed to stay that way, of course, but generating collisions is likely to remain relatively expensive for some time. Blob-dbs and the low-level object system (trees, file-contents, and changesets) are pretty fundamental things. It is likely (and desirable) -- not guaranteed but likely (and desirable) -- that people will invest heavily in building infrastructure that operates solely at that level of abstraction. Arguably, that is already happening. Simultaneously, it is very desirable that some mathemetican somewhere will discover two bitstrings which are different but have SHA1 checksums, and then tell everyone in the world about their discovery. My point is simply that blob-db implementations should assume that the mathemeticians will succeed and take the small steps necessary to make sure that those bitstrings can't be used to crash a distributed blob-db infrastructure. -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 20:37 ` Tom Lord @ 2005-04-29 20:41 ` C. Scott Ananian 0 siblings, 0 replies; 8+ messages in thread From: C. Scott Ananian @ 2005-04-29 20:41 UTC (permalink / raw) To: Tom Lord; +Cc: git, robj On Fri, 29 Apr 2005, Tom Lord wrote: > My point is simply that blob-db implementations should assume that the > mathemeticians will succeed and take the small steps necessary to make > sure that those bitstrings can't be used to crash a distributed > blob-db infrastructure. And my point is that you haven't *begun* to describe how one might use an arbitrary hash collision to "crash a distributed blob-db infrastructure". Remember, first you've got to get some reference to your collision into the db... (and if you can do that, why are you mucking around with hash collisions?) --scott Philadelphia PBPRIME STANDEL for Dummies milita Richard Tomlinson ESSENCE SUMAC Nader KUCLUB WSHOOFS QKENCHANT AK-47 AMQUACK supercomputer ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 0:06 Val Henson's critique of hash-based content storage systems Rob Jellinghaus 2005-04-29 19:45 ` Linus Torvalds @ 2005-04-29 20:14 ` H. Peter Anvin 2005-04-29 20:47 ` Morten Welinder 2 siblings, 0 replies; 8+ messages in thread From: H. Peter Anvin @ 2005-04-29 20:14 UTC (permalink / raw) To: Rob Jellinghaus; +Cc: git Rob Jellinghaus wrote: > I assume most people here have read this, but just in case: > > http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson.pdf > I have to pull out the big flamethrower, especially against someone I consider a friend, but that paper is a classic example on how many people don't understand probability. The *only* valid criticism in it is that we may not know enough about the future validity of cryptographic hash function, however, she also does not analyze the failure scenarios applicable to those kinds of failures barely at all. In the end, the whole paper centers around "this makes me feel nervous", without really justifying it in any reasonable way. It is just one of many papers on cryptoanalysis written by someone with no real background in the field. It really saddens me to see someone like Val fall into that particular trap. -hpa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Val Henson's critique of hash-based content storage systems 2005-04-29 0:06 Val Henson's critique of hash-based content storage systems Rob Jellinghaus 2005-04-29 19:45 ` Linus Torvalds 2005-04-29 20:14 ` H. Peter Anvin @ 2005-04-29 20:47 ` Morten Welinder 2 siblings, 0 replies; 8+ messages in thread From: Morten Welinder @ 2005-04-29 20:47 UTC (permalink / raw) To: Rob Jellinghaus; +Cc: git On 4/28/05, Rob Jellinghaus <robj@unrealities.com> wrote: > I assume most people here have read this, but just in case: > > http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson.pdf The math in section 3 is bogus. 1-(1-2^-b)^n isn't hard to compute and even if it was, it is the wrong formula. (Set n==2^b; you obviously should get probability 1 for collision.) The right formula is 1-B!/B^n/(B-n)! where B=2^n. For n=2^80 and b=160 you get about 39%. Morten ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-04-29 20:43 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-04-29 0:06 Val Henson's critique of hash-based content storage systems Rob Jellinghaus 2005-04-29 19:45 ` Linus Torvalds 2005-04-29 19:52 ` Tom Lord 2005-04-29 20:17 ` C. Scott Ananian 2005-04-29 20:37 ` Tom Lord 2005-04-29 20:41 ` C. Scott Ananian 2005-04-29 20:14 ` H. Peter Anvin 2005-04-29 20:47 ` Morten Welinder
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).