Hi Jonathan,

On Fri, Aug 13, 2010 at 03:00:31PM -0500, Jonathan Nieder wrote:
> Clemens Buchacher wrote:
> 
> > Since commit 2f82f760 (Take binary diffs into account for "git rebase"), binary
> > files are included in patch ID computation. Binary files are diffed using the
> > text diff algorithm, however
> [...]
> > Instead of hashing the diff of binary files, use the post-image sha1, which is
> > just as unique. As a result, performance is much improved.
> 
> Maybe it should use both the pre- and post-image?

That would make the patch ID more correct in that it will identify
a particular change. But ultimately, we want to know whether or not
a change has been applied already. If the contents of a binary file
are the same in both commits, this is almost certainly true,
regardless of whether or not the pre-images match.

So I think we get better behavior if we ignore the pre-image.
Although the difference is probably minuscule.

> 
> > diff --git a/diff.c b/diff.c
> > index 17873f3..20fc6db 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -3758,6 +3758,12 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
> >  					len2, p->two->path);
> >  		git_SHA1_Update(&ctx, buffer, len1);
> >  
> > +		if (diff_filespec_is_binary(p->two)) {
> > +			len1 = sprintf(buffer, "%s", sha1_to_hex(p->two->sha1));
> > +			git_SHA1_Update(&ctx, buffer, len1);
> 
> 
> i.e., maybe also
> 
> 			git_SHA1_Update(&ctx, sha1_to_hex(p->one->sha1), 40);

Thanks.