Hi Jonathan, On Fri, Aug 13, 2010 at 03:00:31PM -0500, Jonathan Nieder wrote: > Clemens Buchacher wrote: > > > Since commit 2f82f760 (Take binary diffs into account for "git rebase"), binary > > files are included in patch ID computation. Binary files are diffed using the > > text diff algorithm, however > [...] > > Instead of hashing the diff of binary files, use the post-image sha1, which is > > just as unique. As a result, performance is much improved. > > Maybe it should use both the pre- and post-image? That would make the patch ID more correct in that it will identify a particular change. But ultimately, we want to know whether or not a change has been applied already. If the contents of a binary file are the same in both commits, this is almost certainly true, regardless of whether or not the pre-images match. So I think we get better behavior if we ignore the pre-image. Although the difference is probably minuscule. > > > diff --git a/diff.c b/diff.c > > index 17873f3..20fc6db 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -3758,6 +3758,12 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) > > len2, p->two->path); > > git_SHA1_Update(&ctx, buffer, len1); > > > > + if (diff_filespec_is_binary(p->two)) { > > + len1 = sprintf(buffer, "%s", sha1_to_hex(p->two->sha1)); > > + git_SHA1_Update(&ctx, buffer, len1); > > > i.e., maybe also > > git_SHA1_Update(&ctx, sha1_to_hex(p->one->sha1), 40); Thanks.