git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [RFC PATCH] diff: do not use creation-half of -B as a rename target candidate
Date: Sun, 01 Feb 2015 19:18:35 -0800	[thread overview]
Message-ID: <xmqqfvapuhkk.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <xmqqegqaahnh.fsf@gitster.dls.corp.google.com> (Junio C. Hamano's message of "Sat, 31 Jan 2015 11:12:34 -0800")

When a commit creates new file B by copying the contents of an
existing file A and making a small edit and makes large edit to A,
"diff -M" would not see any copying or renaming, because the file A
appears in both preimage and postimage.  The output ends up showing
two large patches, one that adds almost the entirety of original A
to the newly created file B, and the other that removes almost the
entirety of the original contents from A and adds new material to
it.

"diff -B -M" was invented to allow us notice this case and instead
express the change as one patch that copies A to B with small edit,
and rewrites A with contents unrelated to its original.  The patch
expressed this way becomes much easier to read, because the reader
can see that most of the contents in B after the change came from
the original A (the patch header shows "copy from A" for B), and A
was completely rewritten by the patch (the patch body shows
everything removed first and then all new material added).

However this logic has a bug to incorrectly produce an unapplicable
patch in other cases.  Starting from existing files A and B, when a
commit removes A and makes the resulting B similar to the original
contents of A, it incorrectly expressed it as a change that renames
A to B and then makes small edits.  Such a patch will not apply to
the original state the patch was taken from, as B exists there.

The root cause of the problem is that after the complete rewrite of
B is detected and is internally split into deletion of old B and
creation of new B, the rename detection machinery matches the old
contents of A (which will go away) with the newly created B, because
they are similar.  Considering the deletion-half of the change to B
as possible rename source (i.e. from which a new file is created) is
good, but considering the creation-half as possible rename
destination (i.e. the file is created by taking whole contents from
elsewhere) is bad---because we know B being a broken filepair means
it already existed, and cannot have been newly _created_.

For a simple reproduction, go to your copy of the kernel tree and do
this:

    $ git diff -B -M v2.6.13 v2.6.12 -- \
        arch/ppc64/kernel/rtas_pci.c arch/ppc64/kernel/pSeries_pci.c >:patch

    $ git reset --hard
    $ git checkout v2.6.13

    $ git apply --cached --whitespace=nowarn :patch
    error: arch/ppc64/kernel/pSeries_pci.c: already exists in index

In this example, rtas_pci.c and pSeries_pci.c corresponds to files A
and B, respectively, in the more general description of the problem
given earlier.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Here is what I am at the moment; I cannot quite explain (hence I
   cannot convince myself) why this is the right solution, but it
   seems to make the above sample case work without breaking any
   existing tests.  It is possible that the tests that would break
   without the "&& !p->score" bit are expecting wrong results, but I
   didn't look at them in detail.

 diffcore-rename.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..f4e8e00 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -516,6 +516,8 @@ void diffcore_rename(struct diff_options *options)
 			else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
 				 is_empty_blob_sha1(p->two->sha1))
 				continue;
+			else if (p->broken_pair && !p->score)
+				continue;
 			else
 				locate_rename_dst(p->two, 1);
 		}
-- 
2.3.0-rc2-165-gbd2cd9b

  reply	other threads:[~2015-02-02  3:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-31 19:12 [Git BUG] Please do not use "-B -M" in "diff" family for now Junio C Hamano
2015-02-02  3:18 ` Junio C Hamano [this message]
2015-02-02  5:52   ` [RFC PATCH] diff: do not use creation-half of -B as a rename target candidate Stefan Beller
2015-02-02  6:47     ` Yue Lin Ho
2015-02-02 18:25   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqfvapuhkk.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).