Linus Torvalds wrote:
> I haven't seen the attack yet, but git doesn't actually just hash the
> data, it does prepend a type/length field to it. That usually tends to
> make collision attacks much harder, because you either have to make
> the resulting size the same too, or you have to be able to also edit
> the size field in the header.

I have some sha1 collisions (and other fun along these lines) in 
https://github.com/joeyh/supercollider

That includes two files with the same SHA and size, which do get
different blobs thanks to the way git prepends the header to the
content.

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65	bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1	good.pdf

While appending identical data to these colliding files does generate
other collisions, prepending data does not.

It would cost 6500 CPU years + 100 GPU years to generate valid colliding
git objects using the methods of the paper's authors. That might be cost
effective if it helped get a backdoor into eg, the kernel.

>  (b) we can probably easily add some extra sanity checks to the opaque
> data we do have, to make it much harder to do the hiding of random
> data that these attacks pretty much always depend on.

For example, git fsck does warn about a commit message with opaque
data hidden after a NUL. But, git show/merge/pull give no indication
that something funky is going on when working with such commits.

-- 
see shy jo