Linus Torvalds wrote: > I haven't seen the attack yet, but git doesn't actually just hash the > data, it does prepend a type/length field to it. That usually tends to > make collision attacks much harder, because you either have to make > the resulting size the same too, or you have to be able to also edit > the size field in the header. I have some sha1 collisions (and other fun along these lines) in https://github.com/joeyh/supercollider That includes two files with the same SHA and size, which do get different blobs thanks to the way git prepends the header to the content. joey@darkstar:~/tmp/supercollider>sha1sum bad.pdf good.pdf d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a bad.pdf d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a good.pdf joey@darkstar:~/tmp/supercollider>git ls-tree HEAD 100644 blob ca44e9913faf08d625346205e228e2265dd12b65 bad.pdf 100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1 good.pdf While appending identical data to these colliding files does generate other collisions, prepending data does not. It would cost 6500 CPU years + 100 GPU years to generate valid colliding git objects using the methods of the paper's authors. That might be cost effective if it helped get a backdoor into eg, the kernel. > (b) we can probably easily add some extra sanity checks to the opaque > data we do have, to make it much harder to do the hiding of random > data that these attacks pretty much always depend on. For example, git fsck does warn about a commit message with opaque data hidden after a NUL. But, git show/merge/pull give no indication that something funky is going on when working with such commits. -- see shy jo