git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Robert Dailey <rcdailey.lists@gmail.com>
Cc: Git <git@vger.kernel.org>
Subject: Re: diff.renames not working?
Date: Fri, 13 Sep 2019 23:30:17 -0400	[thread overview]
Message-ID: <20190914033017.GA30458@sigill.intra.peff.net> (raw)
In-Reply-To: <CAHd499BT35jvPtsuD9gfJB0HJ=NxtzyQOaiD7-=sHJbFYhphpg@mail.gmail.com>

On Fri, Sep 13, 2019 at 03:24:06PM -0500, Robert Dailey wrote:

> Now my goal is to diff `ZPayClient.hpp` and see the changes to the
> moved-out portion of code as it relates to the original state of that
> code in `JniPaymentManager.hpp`. To do this, I tried this command:
> 
> ```
> $ git diff master...topic -- ZPay/ZPayClient.hpp
> ```
> 
> The unified diff header I got back is:
> 
> ```
> diff --git ZPay/ZPayClient.hpp ZPay/ZPayClient.hpp
> new file mode 100644
> index 00000000..6ebc2a9a
> --- /dev/null
> +++ ZPay/ZPayClient.hpp
> ```
> 
> Hmm, it's treating it as a new file. Even though I have `diff.renames`
> set to `copies`? Even though `diff --name-status` acknowledges the
> relationship with the original file for the code on `master`? This is
> confusing...

This is due to the way that rename detection works. When looking for
renames, the tree diff gives us a series of candidate "sources" that
were deleted and candidates "dests" that were added. And then we try to
match them up.

Copy detection differs in that it uses any file touched in the diff as a
source, not just deletions.

And then "--find-copies-harder" uses even unmodified files as sources
(but see below).

So here's a simplified setup:

  git init repo
  cd repo

  seq 100 >a
  git add a
  git commit -m a

  cp a b
  seq 99 >a
  git add a b
  git commit -m b

and then we can try these commands:

  # this won't find a rename, because "a" was not deleted
  git diff-tree --name-status -M HEAD^ HEAD

  # this will find the copy, because now we consider "a" a source
  git diff-tree --name-status -C HEAD^ HEAD

  # this _won't_ find the copy, because we limited our tree diff to
  # just look at "b"; hence we don't even consider "a" a source
  git diff-tree --name-status -C HEAD^ HEAD -- b

And that last one is the one that confused you. Naively it seems like
doing this would work (two "-C" are the same as "--find-copies-harder"):

  git diff-tree --name-status -C -C HEAD^ HEAD -- b

but it doesn't. That's because we're still using the tree diff to find
sources, and just adding unmodified entries to the source list. But our
pathspec prevents the diff from even considering "a". While this might
seem useless at first, you can imagine something like:

  git diff-tree -C -C HEAD^ HEAD -- subdir/

which would consider all files in subdir as sources, but not those
outside (which may be especially important for performance).

But there's no (clean) way to expand the set of paths that we consider
as sources without also showing them in the output. There are two
useful variants I could imagine, though:

  - a way to consider _all_ paths in the repository, not just those in
    the pathspec, as sources, but show only the entries from the
    pathspec. This could probably be a "harder" version of
    "--find-copies-harder", something like "-C -C -C <revs> -- b".
    Naturally this would be even more expensive in a big repo.

  - a way to independently specify the source pathspec and the
    output-limiting pathspec. This is a cheaper version of the one
    above, where you could look at a subset of the tree a sources, but
    limit the set of shown paths even further. It's not conceptually
    that difficult, but syntactically it gets weird since you have two
    lists of pathspecs on the command-line.

I think the first one wouldn't be _too_ hard if somebody is interested
in getting their feet wet with the diffcore-rename.c code. The second is
probably not worth the effort.

> Out of curiosity, I thought I'd try this command:
> 
> ```
> git diff --follow master...topic -- ZPay/ZPayClient.hpp
> ```

So yes, that does work, and is why I added the "(clean)" qualifier
above. It behaves like the "-C -C -C" I proposed. But the fact that it
does so is entirely accidental. What happens is this:

  - we're a little sloppy about what constitutes a traversal option and
    what is a diff option. Many diff commands rely on setup_revisions(),
    which parses both. So "diff --follow" probably _should_ be flagged
    as an error, but isn't.

  - the implementation of "--follow" works by doing a separate,
    from-scratch tree-level diff on each commit (it _has_ to ignore your
    pathspec, since by definition it allows only a single file to be in
    the pathspec). And then rather than throwing away that result, it
    feeds it to the rest of the diff pipeline, which then shows the
    output you expected.

So it does do what you want, but only for the single-file case. And
certainly it was never intended to, and that might change in the future.

> Now this looks more like it. I can actually see a useful diff here,
> instead of everything looking like a new file. But there is a lot of
> confusion here:
> 
> 1. `diff --follow` is not a documented[1] option. Why does it work?

Accident. :) See above.

> 2. `diff -M` doesn't actually work either. It should, though. In fact,
> I expected it to work as `--follow` does. But it doesn't.

It doesn't work because this is a copy, not a rename.

> 3. The `diff.renames` config doesn't seem to be working here, when it should.

It does, but the pathspec prevents it from finding a source candidate.

-Peff

  parent reply	other threads:[~2019-09-14  3:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13 20:24 diff.renames not working? Robert Dailey
2019-09-13 22:36 ` Bryan Turner
2019-09-14  3:30 ` Jeff King [this message]
2019-09-16 14:25   ` Robert Dailey
2019-09-16 17:35   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190914033017.GA30458@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=rcdailey.lists@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).