git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Merge commit diff results are confusing and inconsistent
@ 2019-05-03 15:55 Robert Dailey
  2019-05-03 19:12 ` Eckhard Maaß
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Dailey @ 2019-05-03 15:55 UTC (permalink / raw)
  To: Git

I'm hoping this is mostly a learning opportunity for me. I'm assuming
things are working as designed, but I just don't understand something
fundamental.

I have a merge commit. HEAD is currently pointing at this merge
commit. To be exact, HEAD points to master, which points to the merge
commit. My goal is to diff only the changes in the merge commit (stuff
committed directly in the merge commit, such as conflict resolutions).
To start out, I learned about @^@, @^!, and @^-. @^! sounded like what
I wanted. It gives me this output:

$ git rev-parse @^!
21f5a4b9fee4f12e7793919f65361d2c16f7d240
^14bd840c1d591c9dc066ed1aab59b5ec14d502bb
^944af379480826764f2f31b67848e2885b95b4a6

Perfect. This should give me just the diff of 21f5... and exclude
everything else, right? So I did this:

$ git diff @^!

However, I get *all* changes on the branch (second parent) and changes
in the merge commit itself. Basically it acts as if I used @^-, which
seems wrong to me. So to test another angle, I used the revisions
output by rev-parse directly:

$ git diff 21f5a4b9fee4f12e7793919f65361d2c16f7d240
^14bd840c1d591c9dc066ed1aab59b5ec14d502bb
^944af379480826764f2f31b67848e2885b95b4a6

Interestingly, this showed me only the changes in the merge commit
(21f5a4) and nothing else. Between this command and @^!, I feel the
two are exactly the same. So why does @^! not work as I expect, but
explicitly specifying the revisions does? What am I missing here?

When I use @^! in `git log`, I do only see the merge commit and no
other commits. So at least log is treating it correctly.

$ git version
git version 2.20.1.windows.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-03 15:55 Merge commit diff results are confusing and inconsistent Robert Dailey
@ 2019-05-03 19:12 ` Eckhard Maaß
  2019-05-06 15:38   ` Robert Dailey
  0 siblings, 1 reply; 12+ messages in thread
From: Eckhard Maaß @ 2019-05-03 19:12 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

On Fri, May 03, 2019 at 10:55:54AM -0500, Robert Dailey wrote:
> I have a merge commit. HEAD is currently pointing at this merge
> commit. To be exact, HEAD points to master, which points to the merge
> commit. My goal is to diff only the changes in the merge commit (stuff
> committed directly in the merge commit, such as conflict resolutions).

Hold on. Basically, there is no such thing as "committed directly" for a
merge. You only have differences of the commit to its parents. What you
aim for are changes that you cannot find in either preimage - and this
can be observed best with the --cc option. Maybe also interesting would
be -c for showing a comined diff and -m for showing diffs to parents
after one another.

> To start out, I learned about @^@, @^!, and @^-. @^! sounded like what
> I wanted. It gives me this output:
> 
> $ git rev-parse @^!
> 21f5a4b9fee4f12e7793919f65361d2c16f7d240
> ^14bd840c1d591c9dc066ed1aab59b5ec14d502bb
> ^944af379480826764f2f31b67848e2885b95b4a6
> 
> Perfect. This should give me just the diff of 21f5... and exclude
> everything else, right? So I did this:

There shouldn't be "just the diff of <commit>" - you always have to tell
where to diff it too, intrinsically Git does not save patches, but the
whole content, after all.

> 
> $ git diff @^!
> 
> However, I get *all* changes on the branch (second parent) and changes
> in the merge commit itself. Basically it acts as if I used @^-, which
> seems wrong to me. So to test another angle, I used the revisions
> output by rev-parse directly:
> 
> $ git diff 21f5a4b9fee4f12e7793919f65361d2c16f7d240
> ^14bd840c1d591c9dc066ed1aab59b5ec14d502bb
> ^944af379480826764f2f31b67848e2885b95b4a6
> 
> Interestingly, this showed me only the changes in the merge commit
> (21f5a4) and nothing else. Between this command and @^!, I feel the
> two are exactly the same. So why does @^! not work as I expect, but
> explicitly specifying the revisions does? What am I missing here?
> 
> When I use @^! in `git log`, I do only see the merge commit and no
> other commits. So at least log is treating it correctly.

Somebody else might know better why the diff actually produced the
results you were looking for. I admit it is puzzling to me - I would
have expected to error it out on the output of git rev-parse as there
are three items.

Greetings,
Eckhard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-03 19:12 ` Eckhard Maaß
@ 2019-05-06 15:38   ` Robert Dailey
  2019-05-06 16:52     ` Eckhard Maaß
  2019-05-06 23:52     ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 12+ messages in thread
From: Robert Dailey @ 2019-05-06 15:38 UTC (permalink / raw)
  To: Eckhard Maaß; +Cc: Git

I feel like you got hung up too much on exact wording of what I was
trying to describe. I do apologize I don't have the background to
explain things 100% accurately, especially at a low level. My
explanations are mostly intended to be as a user, based on what is
observable, and based on intent. I'll clarify in the quotes below...

On Fri, May 3, 2019 at 2:12 PM Eckhard Maaß
<eckhard.s.maass@googlemail.com> wrote:
> Hold on. Basically, there is no such thing as "committed directly" for a
> merge. You only have differences of the commit to its parents. What you
> aim for are changes that you cannot find in either preimage - and this
> can be observed best with the --cc option. Maybe also interesting would
> be -c for showing a comined diff and -m for showing diffs to parents
> after one another.

"Committed directly" here means that I made some changes, none of
which is part of a parent commit. Since no additional commits were
made following the merge, I assume that within the merge commit is
some type of diff. If I perform a merge, make some changes, and amend
those changes into the merge, in mind they ARE contained in that merge
commit. The underlying machinery doesn't matter here: This is the
observable state to the user.

Maybe the machinery, which I have no knowledge of or transparency
into, is important because it is affecting the behavior I'm seeing
when I do the diffs? Not sure...

> There shouldn't be "just the diff of <commit>" - you always have to tell
> where to diff it too, intrinsically Git does not save patches, but the
> whole content, after all.

I do understand this. But again, I'm not trying to be super technical
here. In plain english, all I'm trying to say is that I want to see
the changes that 1 commit introduces into the code base. So when it
comes to communicating the end result I want, I talk about it in terms
of 1 commit (the merge commit). The means to get that output is part
of my question and overall confusion. But as a baseline, I want to
clarify that I do understand a range is required input for the diff
command. In the case of merge commits, the way you specify the ranges
has many forms so I'm not sure based on the results I see, which one
is correct or what they all mean.

> Somebody else might know better why the diff actually produced the
> results you were looking for. I admit it is puzzling to me - I would
> have expected to error it out on the output of git rev-parse as there
> are three items.

Actually I can't think of any other command that can show me what
revision ranges translate to in "raw" commits. To me the raw forms are
always <sha1> and ^<sha1>, repeated as many times and in as many
orders necessary. Don't all of the vanity revision specifications
ultimately boil down to "from this parent" and "not from this parent"?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-06 15:38   ` Robert Dailey
@ 2019-05-06 16:52     ` Eckhard Maaß
  2019-05-06 23:52     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 12+ messages in thread
From: Eckhard Maaß @ 2019-05-06 16:52 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Eckhard Maaß, Git

On Mon, May 06, 2019 at 10:38:12AM -0500, Robert Dailey wrote:
> I feel like you got hung up too much on exact wording of what I was
> trying to describe. I do apologize I don't have the background to
> explain things 100% accurately, especially at a low level. My
> explanations are mostly intended to be as a user, based on what is
> observable, and based on intent. I'll clarify in the quotes below...

I doubt that what is observable is what you described.

> 
> On Fri, May 3, 2019 at 2:12 PM Eckhard Maaß
> <eckhard.s.maass@googlemail.com> wrote:
> > Hold on. Basically, there is no such thing as "committed directly" for a
> > merge. You only have differences of the commit to its parents. What you
> > aim for are changes that you cannot find in either preimage - and this
> > can be observed best with the --cc option. Maybe also interesting would
> > be -c for showing a comined diff and -m for showing diffs to parents
> > after one another.
> 
> "Committed directly" here means that I made some changes, none of
> which is part of a parent commit.

The merge strategy does the same when melding two versions of a file
into one. If one invents a more clever merge strategy, it might appear
that you have "some changes". This is not a mere technicality - you
cannot make a difference between parts that were conflicts in the
original commit and changes you introduced yourself - short of making
the merge again.

> Since no additional commits were
> made following the merge, I assume that within the merge commit is
> some type of diff. If I perform a merge, make some changes, and amend
> those changes into the merge, in mind they ARE contained in that merge
> commit. The underlying machinery doesn't matter here: This is the
> observable state to the user.

Well, they are contained in the merge commit, true. And they would show
in the diffs to the two parents.

> Maybe the machinery, which I have no knowledge of or transparency
> into, is important because it is affecting the behavior I'm seeing
> when I do the diffs? Not sure...
> 
> > There shouldn't be "just the diff of <commit>" - you always have to tell
> > where to diff it too, intrinsically Git does not save patches, but the
> > whole content, after all.
> 
> I do understand this. But again, I'm not trying to be super technical
> here. In plain english, all I'm trying to say is that I want to see
> the changes that 1 commit introduces into the code base.

I do not understand - I especially reiterated on the fundamental design
decision of Git here that one cannot speak of *the* change. This is not
just some small technical detail.

With a merge commit, there is no such thing as "*the change* introduced
into the code base". You can view it a few different ways:

- the content of the merge commit as a whole. However, this is not
  really a change, but the whole content.

- the diff to one of its parents. If you merge a feature branch to
  master, then git diff master^..master does give you the changes
  introduced in master by the merge (given that merge^ is the state
  beforehand)

- all the diffs (or condensed forms) to all parent commits. --cc helps
  you here by ignoring "uninteresting" hunks.

> So when it
> comes to communicating the end result I want, I talk about it in terms
> of 1 commit (the merge commit). The means to get that output is part
> of my question and overall confusion. But as a baseline, I want to
> clarify that I do understand a range is required input for the diff
> command. In the case of merge commits, the way you specify the ranges
> has many forms so I'm not sure based on the results I see, which one
> is correct or what they all mean.

I doubt that the intentions of the revision short hands you gave have
should have some meaningful transition to the diff machinery. For me,
here some technicality strikes and gives results which are
counterintuitive to me - for me, all your calls should result in errors.

> 
> > Somebody else might know better why the diff actually produced the
> > results you were looking for. I admit it is puzzling to me - I would
> > have expected to error it out on the output of git rev-parse as there
> > are three items.
> 
> Actually I can't think of any other command that can show me what
> revision ranges translate to in "raw" commits. To me the raw forms are
> always <sha1> and ^<sha1>, repeated as many times and in as many
> orders necessary. Don't all of the vanity revision specifications
> ultimately boil down to "from this parent" and "not from this parent"?

This seems to be very wrong for calculating a diff - you nee exactly to
points to compare. So it is always a "from one" and "to one".

Greetings,
Eckhard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-06 15:38   ` Robert Dailey
  2019-05-06 16:52     ` Eckhard Maaß
@ 2019-05-06 23:52     ` Ævar Arnfjörð Bjarmason
  2019-05-07 14:10       ` Robert Dailey
  1 sibling, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-06 23:52 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Eckhard Maaß, Git


On Mon, May 06 2019, Robert Dailey wrote:

> I feel like you got hung up too much on exact wording of what I was
> trying to describe. I do apologize I don't have the background to
> explain things 100% accurately, especially at a low level. My
> explanations are mostly intended to be as a user, based on what is
> observable, and based on intent. I'll clarify in the quotes below...
>
> On Fri, May 3, 2019 at 2:12 PM Eckhard Maaß
> <eckhard.s.maass@googlemail.com> wrote:
>> Hold on. Basically, there is no such thing as "committed directly" for a
>> merge. You only have differences of the commit to its parents. What you
>> aim for are changes that you cannot find in either preimage - and this
>> can be observed best with the --cc option. Maybe also interesting would
>> be -c for showing a comined diff and -m for showing diffs to parents
>> after one another.
>
> "Committed directly" here means that I made some changes, none of
> which is part of a parent commit. Since no additional commits were
> made following the merge, I assume that within the merge commit is
> some type of diff. If I perform a merge, make some changes, and amend
> those changes into the merge, in mind they ARE contained in that merge
> commit. The underlying machinery doesn't matter here: This is the
> observable state to the user.
>
> Maybe the machinery, which I have no knowledge of or transparency
> into, is important because it is affecting the behavior I'm seeing
> when I do the diffs? Not sure...
>
>> There shouldn't be "just the diff of <commit>" - you always have to tell
>> where to diff it too, intrinsically Git does not save patches, but the
>> whole content, after all.
>
> I do understand this. But again, I'm not trying to be super technical
> here. In plain english, all I'm trying to say is that I want to see
> the changes that 1 commit introduces into the code base. So when it
> comes to communicating the end result I want, I talk about it in terms
> of 1 commit (the merge commit). The means to get that output is part
> of my question and overall confusion. But as a baseline, I want to
> clarify that I do understand a range is required input for the diff
> command. In the case of merge commits, the way you specify the ranges
> has many forms so I'm not sure based on the results I see, which one
> is correct or what they all mean.
>
>> Somebody else might know better why the diff actually produced the
>> results you were looking for. I admit it is puzzling to me - I would
>> have expected to error it out on the output of git rev-parse as there
>> are three items.
>
> Actually I can't think of any other command that can show me what
> revision ranges translate to in "raw" commits. To me the raw forms are
> always <sha1> and ^<sha1>, repeated as many times and in as many
> orders necessary. Don't all of the vanity revision specifications
> ultimately boil down to "from this parent" and "not from this parent"?

Maybe an example helps, let's say you have two paint buckets, one with
red paint, one with yellow paint. You mix them. What happens?

    (
        rm -rf /tmp/git &&
        git init /tmp/git &&
        cd /tmp/git &&
        git checkout -b red &&

        echo red >color.txt &&
        git add color.txt &&
        git commit -m"red" &&

        git checkout --orphan green &&
        git reset --hard &&
        echo green >color.txt &&
        git add color.txt &&
        git commit -m"green" &&

        git merge --allow-unrelated-histories red;
        echo yellow >color.txt &&
        git add color.txt &&
        git commit -m"red + green = yellow"
    )

I *think* what you're alluding to is trying to discover some sort of
change to whatever the default merge resolution would have been, which
in this case would be closer to:

    (echo green && echo red) >color.txt

But it's important to understand that the whole business of suggesting
how you should merge is just sugar that isn't in any way represented in
the object model that makes it into the repository.

In that model we just had one branch with "color.txt" containing "red",
and another with "green". Then we merged the two together and that
commit merged two histories together, did something to yield an end
result, and now the "color.txt" file contains "yellow".

But what single thing can you look at to describe how you ended up with
"yellow"? There isn't such a single thing, I just know that I have a
commit with two parents:

    $ git cat-file -p HEAD
    tree 6318a50d67e6de533498a4a0c9f46360cff6908a
    parent 2332fc6b40c1cbf9f5daf809f09eb4defdd2ce30
    parent 1707f13d2d236d61ac7496962ecebc50ffff5be3

And that if I diff against the 1st parent we went from green to yellow:

    $ git diff HEAD^1..HEAD
    diff --git a/color.txt b/color.txt
    index a5b73ed..d1ed081 100644
    --- a/color.txt
    +++ b/color.txt
    @@ -1 +1 @@
    -green
    +yellow

And the other from red to yellow:

    $ git diff HEAD^2..HEAD
    diff --git a/color.txt b/color.txt
    index a9d1386..d1ed081 100644
    --- a/color.txt
    +++ b/color.txt
    @@ -1 +1 @@
    -red
    +yellow

To the extent that we can show a single diff at all that's diff-tree's
--cc option:

    $ git diff-tree --cc HEAD
    e89ef1f780d7c979c18cc0f03fd74c560466ef03
    diff --cc color.txt
    index a5b73ed,a9d1386..d1ed081
    --- a/color.txt
    +++ b/color.txt
    @@@ -1,1 -1,1 +1,1 @@@
    - green
     -red
    ++yellow

Sometimes it makes things better, sometimes it's just more
confusing. It's what "git show" will use to render merge commits.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-06 23:52     ` Ævar Arnfjörð Bjarmason
@ 2019-05-07 14:10       ` Robert Dailey
  2019-05-07 14:41         ` Robert Dailey
                           ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Robert Dailey @ 2019-05-07 14:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eckhard Maaß, Git

On Mon, May 6, 2019 at 6:52 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> Maybe an example helps, let's say you have two paint buckets, one with
> red paint, one with yellow paint. You mix them. What happens?
>
>     (
>         rm -rf /tmp/git &&
>         git init /tmp/git &&
>         cd /tmp/git &&
>         git checkout -b red &&
>
>         echo red >color.txt &&
>         git add color.txt &&
>         git commit -m"red" &&
>
>         git checkout --orphan green &&
>         git reset --hard &&
>         echo green >color.txt &&
>         git add color.txt &&
>         git commit -m"green" &&
>
>         git merge --allow-unrelated-histories red;
>         echo yellow >color.txt &&
>         git add color.txt &&
>         git commit -m"red + green = yellow"
>     )
>
> I *think* what you're alluding to is trying to discover some sort of
> change to whatever the default merge resolution would have been, which
> in this case would be closer to:
>
>     (echo green && echo red) >color.txt
>
> But it's important to understand that the whole business of suggesting
> how you should merge is just sugar that isn't in any way represented in
> the object model that makes it into the repository.
>
> In that model we just had one branch with "color.txt" containing "red",
> and another with "green". Then we merged the two together and that
> commit merged two histories together, did something to yield an end
> result, and now the "color.txt" file contains "yellow".
>
> But what single thing can you look at to describe how you ended up with
> "yellow"? There isn't such a single thing, I just know that I have a
> commit with two parents:
>
>     $ git cat-file -p HEAD
>     tree 6318a50d67e6de533498a4a0c9f46360cff6908a
>     parent 2332fc6b40c1cbf9f5daf809f09eb4defdd2ce30
>     parent 1707f13d2d236d61ac7496962ecebc50ffff5be3
>
> And that if I diff against the 1st parent we went from green to yellow:
>
>     $ git diff HEAD^1..HEAD
>     diff --git a/color.txt b/color.txt
>     index a5b73ed..d1ed081 100644
>     --- a/color.txt
>     +++ b/color.txt
>     @@ -1 +1 @@
>     -green
>     +yellow
>
> And the other from red to yellow:
>
>     $ git diff HEAD^2..HEAD
>     diff --git a/color.txt b/color.txt
>     index a9d1386..d1ed081 100644
>     --- a/color.txt
>     +++ b/color.txt
>     @@ -1 +1 @@
>     -red
>     +yellow
>
> To the extent that we can show a single diff at all that's diff-tree's
> --cc option:
>
>     $ git diff-tree --cc HEAD
>     e89ef1f780d7c979c18cc0f03fd74c560466ef03
>     diff --cc color.txt
>     index a5b73ed,a9d1386..d1ed081
>     --- a/color.txt
>     +++ b/color.txt
>     @@@ -1,1 -1,1 +1,1 @@@
>     - green
>      -red
>     ++yellow
>
> Sometimes it makes things better, sometimes it's just more
> confusing. It's what "git show" will use to render merge commits.

Your example is very helpful. I understand what you're saying for
conflicted lines. But the "whatever the default merge resolution would
have been" doesn't exist, because there's no reality where line 1 in
color.txt can be something "automatic" (i.e. deduced by git). The only
reality for the merge commit is some hand-edited replacement to line
1. So there is no "diff what I see with some alternate reality".

The majority use case I'm interested in is seeing net-positive changes
that happen in merge commits. Normally I take for granted that merge
commits have nothing meaningful in them (meaningful here defined as
something unexpected for a merge commit). But what if someone makes a
poor decision and does some crazy refactoring in 1 file and amends it
into a merge commit? Let's also say that these changes are done to a
file that wasn't modified in any parent (say a unrelated.txt next to
your color.txt). Since neither parent cares about that file for the
purposes of the merge, I am trying to make sense of a revision
specification that can be used to see what they did to that file.

Even ignoring that issue, the more concerning observation of mine is
that `diff @^!` produces any output at all. If you exclude both
parents, why do I see a diff for parent 2 (I see the complete diff of
the branch that was merged in)?

Again, thank you for your example, you definitely made things very
clear for me. I see where the confusion is. And I think --cc is a good
way to get more context. At this point I'm just concerned about the
@^! behavior with merge commits & diff.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:10       ` Robert Dailey
@ 2019-05-07 14:41         ` Robert Dailey
  2019-05-07 14:58         ` Denton Liu
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Robert Dailey @ 2019-05-07 14:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eckhard Maaß, Git

On Tue, May 7, 2019 at 9:10 AM Robert Dailey <rcdailey.lists@gmail.com> wrote:
> Your example is very helpful. I understand what you're saying for
> conflicted lines. But the "whatever the default merge resolution would
> have been" doesn't exist, because there's no reality where line 1 in
> color.txt can be something "automatic" (i.e. deduced by git). The only
> reality for the merge commit is some hand-edited replacement to line
> 1. So there is no "diff what I see with some alternate reality".
>
> The majority use case I'm interested in is seeing net-positive changes
> that happen in merge commits. Normally I take for granted that merge
> commits have nothing meaningful in them (meaningful here defined as
> something unexpected for a merge commit). But what if someone makes a
> poor decision and does some crazy refactoring in 1 file and amends it
> into a merge commit? Let's also say that these changes are done to a
> file that wasn't modified in any parent (say a unrelated.txt next to
> your color.txt). Since neither parent cares about that file for the
> purposes of the merge, I am trying to make sense of a revision
> specification that can be used to see what they did to that file.
>
> Even ignoring that issue, the more concerning observation of mine is
> that `diff @^!` produces any output at all. If you exclude both
> parents, why do I see a diff for parent 2 (I see the complete diff of
> the branch that was merged in)?
>
> Again, thank you for your example, you definitely made things very
> clear for me. I see where the confusion is. And I think --cc is a good
> way to get more context. At this point I'm just concerned about the
> @^! behavior with merge commits & diff.

Also I'm really confused how you got diff-tree to work. If I pick any
arbitrary SHA1 of a merge commit in my existing repo's history,
diff-tree produces only a SHA1 as the result:

$ git diff-tree --cc bdd47a73d
bdd47a73d18948aa46a8a7aa964543f0d989ffd4

I tried with just `-c` as well; same result.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:10       ` Robert Dailey
  2019-05-07 14:41         ` Robert Dailey
@ 2019-05-07 14:58         ` Denton Liu
  2019-05-07 15:55           ` Eckhard Maaß
  2019-05-07 15:26         ` Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Denton Liu @ 2019-05-07 14:58 UTC (permalink / raw)
  To: Robert Dailey
  Cc: Ævar Arnfjörð Bjarmason, Eckhard Maaß, Git

Hi Robert,

On Tue, May 07, 2019 at 09:10:12AM -0500, Robert Dailey wrote:

[snip]

> Even ignoring that issue, the more concerning observation of mine is
> that `diff @^!` produces any output at all. If you exclude both
> parents, why do I see a diff for parent 2 (I see the complete diff of
> the branch that was merged in)?
> 
> Again, thank you for your example, you definitely made things very
> clear for me. I see where the confusion is. And I think --cc is a good
> way to get more context. At this point I'm just concerned about the
> @^! behavior with merge commits & diff.

@^! is undocumented behaviour. Junio touched on why it behaves this way
here[1], in case you're interested.

For more details, this code[2] just blindly diffs the first two
endpoints returned preceding `repo_init_revisions`.

Also, not to rehash an old discussion but I'll let this thread be my
argument *against* allowing range-notation in git-diff.

[1]: https://public-inbox.org/git/xmqqef7ch80v.fsf@gitster-ct.c.googlers.com/
[2]: https://github.com/gitster/git/blob/master/builtin/diff.c#L385

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:10       ` Robert Dailey
  2019-05-07 14:41         ` Robert Dailey
  2019-05-07 14:58         ` Denton Liu
@ 2019-05-07 15:26         ` Ævar Arnfjörð Bjarmason
  2019-05-07 16:44         ` Elijah Newren
  2019-05-11 14:08         ` Philip Oakley
  4 siblings, 0 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-07 15:26 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Eckhard Maaß, Git, Elijah Newren


On Tue, May 07 2019, Robert Dailey wrote:

> On Mon, May 6, 2019 at 6:52 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> Maybe an example helps, let's say you have two paint buckets, one with
>> red paint, one with yellow paint. You mix them. What happens?
>>
>>     (
>>         rm -rf /tmp/git &&
>>         git init /tmp/git &&
>>         cd /tmp/git &&
>>         git checkout -b red &&
>>
>>         echo red >color.txt &&
>>         git add color.txt &&
>>         git commit -m"red" &&
>>
>>         git checkout --orphan green &&
>>         git reset --hard &&
>>         echo green >color.txt &&
>>         git add color.txt &&
>>         git commit -m"green" &&
>>
>>         git merge --allow-unrelated-histories red;
>>         echo yellow >color.txt &&
>>         git add color.txt &&
>>         git commit -m"red + green = yellow"
>>     )
>>
>> I *think* what you're alluding to is trying to discover some sort of
>> change to whatever the default merge resolution would have been, which
>> in this case would be closer to:
>>
>>     (echo green && echo red) >color.txt
>>
>> But it's important to understand that the whole business of suggesting
>> how you should merge is just sugar that isn't in any way represented in
>> the object model that makes it into the repository.
>>
>> In that model we just had one branch with "color.txt" containing "red",
>> and another with "green". Then we merged the two together and that
>> commit merged two histories together, did something to yield an end
>> result, and now the "color.txt" file contains "yellow".
>>
>> But what single thing can you look at to describe how you ended up with
>> "yellow"? There isn't such a single thing, I just know that I have a
>> commit with two parents:
>>
>>     $ git cat-file -p HEAD
>>     tree 6318a50d67e6de533498a4a0c9f46360cff6908a
>>     parent 2332fc6b40c1cbf9f5daf809f09eb4defdd2ce30
>>     parent 1707f13d2d236d61ac7496962ecebc50ffff5be3
>>
>> And that if I diff against the 1st parent we went from green to yellow:
>>
>>     $ git diff HEAD^1..HEAD
>>     diff --git a/color.txt b/color.txt
>>     index a5b73ed..d1ed081 100644
>>     --- a/color.txt
>>     +++ b/color.txt
>>     @@ -1 +1 @@
>>     -green
>>     +yellow
>>
>> And the other from red to yellow:
>>
>>     $ git diff HEAD^2..HEAD
>>     diff --git a/color.txt b/color.txt
>>     index a9d1386..d1ed081 100644
>>     --- a/color.txt
>>     +++ b/color.txt
>>     @@ -1 +1 @@
>>     -red
>>     +yellow
>>
>> To the extent that we can show a single diff at all that's diff-tree's
>> --cc option:
>>
>>     $ git diff-tree --cc HEAD
>>     e89ef1f780d7c979c18cc0f03fd74c560466ef03
>>     diff --cc color.txt
>>     index a5b73ed,a9d1386..d1ed081
>>     --- a/color.txt
>>     +++ b/color.txt
>>     @@@ -1,1 -1,1 +1,1 @@@
>>     - green
>>      -red
>>     ++yellow
>>
>> Sometimes it makes things better, sometimes it's just more
>> confusing. It's what "git show" will use to render merge commits.

First, to update my example this is bette (I was being overly clever
with the unrelated histories):

    (
        rm -rf /tmp/git &&
        git init /tmp/git &&
        cd /tmp/git &&

	>color.txt &&
        git add color.txt &&
        git commit -m"base version" &&
        git tag base &&

        git checkout -b red base &&
        echo red >color.txt &&
        git add color.txt &&
        git commit -m"red" &&

        git checkout -b green base &&
        echo green >color.txt &&
        git add color.txt &&
        git commit -m"green" &&

        git checkout master &&
        git merge red green;
        echo yellow >color.txt &&
        git add color.txt &&
        echo hello >README.txt &&
        git add README.txt &&
        git commit -m"red + green = yellow"
    )


> Your example is very helpful. I understand what you're saying for
> conflicted lines. But the "whatever the default merge resolution would
> have been" doesn't exist, because there's no reality where line 1 in
> color.txt can be something "automatic" (i.e. deduced by git). The only
> reality for the merge commit is some hand-edited replacement to line
> 1. So there is no "diff what I see with some alternate reality".
>
> The majority use case I'm interested in is seeing net-positive changes
> that happen in merge commits. Normally I take for granted that merge
> commits have nothing meaningful in them (meaningful here defined as
> something unexpected for a merge commit). But what if someone makes a
> poor decision and does some crazy refactoring in 1 file and amends it
> into a merge commit? Let's also say that these changes are done to a
> file that wasn't modified in any parent (say a unrelated.txt next to
> your color.txt). Since neither parent cares about that file for the
> purposes of the merge, I am trying to make sense of a revision
> specification that can be used to see what they did to that file.

To clarify, by "doesn't exist" I mean in the sense of the underlying
object model. I.e. it's conceivable to think of an SCM that would store
what it thought was the merge resolution, followed by a diff from the
user on top.

You could even emulate that in git by having a policy of committing the
unsolved merge resolution followed by the "real" solution, but that's
not how anyone uses it.

That doesn't mean you can't get what you want, notice how in my updated
example I've sneakily put a new README.txt into the merge commit. Now
after that merge let's see what merge-tree would do given the two
un-merged tips (conflict):

    $ git merge-tree base HEAD^1 HEAD^2
    changed in both
      base   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 color.txt
      our    100644 a9d1386a1d99728de010ad4044fb984886b57f33 color.txt
      their  100644 a5b73ed2e94dafd12d02ab7b3ba3506cf8e642a2 color.txt
    @@ -1 +1,5 @@
    +<<<<<<< .our
     red
    +=======
    +green
    +>>>>>>> .their

v.s. what diff-tree --cc reports:

    $ git diff-tree --cc HEAD
    b6e4d986d8850a2465fd5e648c670865bba59c05
    diff --cc README.txt
    index 0000000,0000000..ce01362
    new file mode 100644
    --- /dev/null
    +++ b/README.txt
    @@@ -1,0 -1,0 +1,1 @@@
    ++hello
    diff --cc color.txt
    index a9d1386,a5b73ed..d1ed081
    --- a/color.txt
    +++ b/color.txt
    @@@ -1,1 -1,1 +1,1 @@@
    - red
     -green
    ++yellow

There's no reason we couldn't learn some mode where we walk the history
and report differences like that, but note that the "what would I get if
I merged" plumbing in git is still rather bad/incomplete in some areas,
and also changes between versions of Git.

E.g. Elijah Newren has been working on better rename detection recently,
if you ran some version of that merge-tree (or however it's exposed, I'm
not very familiar with this area) you'd get a "danger Will Robinson,
manually munged merge!", even though it was a straightforward resolution
of the suggested merge, and git's merge heuristics had just changed
between versions.

> Even ignoring that issue, the more concerning observation of mine is
> that `diff @^!` produces any output at all. If you exclude both
> parents, why do I see a diff for parent 2 (I see the complete diff of
> the branch that was merged in)?

I'm not sure, but I think this has to do with "diff" only taking two
endpoints, see its manpage. It's curious that when I feed just the first
two in manually I get the same as when it's passed to git-diff directly:

    $ git diff -p @^!
    diff --git a/README.txt b/README.txt
    new file mode 100644
    index 0000000..ce01362
    --- /dev/null
    +++ b/README.txt
    @@ -0,0 +1 @@
    +hello
    diff --git a/color.txt b/color.txt
    index a9d1386..d1ed081 100644
    --- a/color.txt
    +++ b/color.txt
    @@ -1 +1 @@
    -red
    +yellow
    $ git diff -p $(git rev-parse '@^!' | head -n 2)
    diff --git a/README.txt b/README.txt
    new file mode 100644
    index 0000000..ce01362
    --- /dev/null
    +++ b/README.txt
    @@ -0,0 +1 @@
    +hello
    diff --git a/color.txt b/color.txt
    index a9d1386..d1ed081 100644
    --- a/color.txt
    +++ b/color.txt
    @@ -1 +1 @@
    -red
    +yellow

But feeding all three into it gives output similar to --cc:

    $ git diff -p $(git rev-parse '@^!')
    diff --cc README.txt
    index 0000000,0000000..ce01362
    new file mode 100644
    --- /dev/null
    +++ b/README.txt
    @@@ -1,0 -1,0 +1,1 @@@
    ++hello
    diff --cc color.txt
    index a9d1386,a5b73ed..d1ed081
    --- a/color.txt
    +++ b/color.txt
    @@@ -1,1 -1,1 +1,1 @@@
    - red
     -green
    ++yellow

I don't know. I spent a while looking into this a while ago and then
paged it out of my brain :)

To reply to your downthread follow-up. The --cc mode won't work on all
commits due to what it's doing, but works on e.g. the merge commit
produced by my sample history.

> Again, thank you for your example, you definitely made things very
> clear for me. I see where the confusion is. And I think --cc is a good
> way to get more context. At this point I'm just concerned about the
> @^! behavior with merge commits & diff.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:58         ` Denton Liu
@ 2019-05-07 15:55           ` Eckhard Maaß
  0 siblings, 0 replies; 12+ messages in thread
From: Eckhard Maaß @ 2019-05-07 15:55 UTC (permalink / raw)
  To: Denton Liu
  Cc: Robert Dailey, Ævar Arnfjörð Bjarmason,
	Eckhard Maaß,
	Git

On Tue, May 07, 2019 at 10:58:49AM -0400, Denton Liu wrote:
> For more details, this code[2] just blindly diffs the first two
> endpoints returned preceding `repo_init_revisions`.

If you throw in more than two endpoints, the result is a combined diff
with respect to the first commit. You can have some fun with that, eg
"git diff -c @ @~4 @^^2~4".

> Also, not to rehash an old discussion but I'll let this thread be my
> argument *against* allowing range-notation in git-diff.

Well, I think the most confusing part is to have this undocumented
around. The range notation are basically just the same symbols for
git-diff, but the meaning seems to be quite different. If this is going
to stay - shouldn't there be at least some documentation for people
trying such stuff out? And that we can point to? Or is there a reason to
keep this undocumented?

Greetings,
Eckhard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:10       ` Robert Dailey
                           ` (2 preceding siblings ...)
  2019-05-07 15:26         ` Ævar Arnfjörð Bjarmason
@ 2019-05-07 16:44         ` Elijah Newren
  2019-05-11 14:08         ` Philip Oakley
  4 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2019-05-07 16:44 UTC (permalink / raw)
  To: Robert Dailey
  Cc: Ævar Arnfjörð Bjarmason, Eckhard Maaß, Git

On Tue, May 7, 2019 at 7:12 AM Robert Dailey <rcdailey.lists@gmail.com> wrote:
> On Mon, May 6, 2019 at 6:52 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> The majority use case I'm interested in is seeing net-positive changes
> that happen in merge commits. Normally I take for granted that merge
> commits have nothing meaningful in them (meaningful here defined as
> something unexpected for a merge commit). But what if someone makes a
> poor decision and does some crazy refactoring in 1 file and amends it
> into a merge commit? Let's also say that these changes are done to a
> file that wasn't modified in any parent (say a unrelated.txt next to
> your color.txt). Since neither parent cares about that file for the
> purposes of the merge, I am trying to make sense of a revision
> specification that can be used to see what they did to that file.

An ability to find evil merges/cherry-picks/reverts by diffing a merge
commit to an auto-merge of some sort is something I am planning to
tackle; I'm tracking it at
https://bugs.chromium.org/p/git/issues/detail?id=12.  I'm working on
filter-repo first, then the merge rewrite, and then if I'm not
distracted by other stuff (a big if), then I'll tackle this.  Both
those other projects that are first in the list are rather large,
though, so it may be a while...

> Even ignoring that issue, the more concerning observation of mine is
> that `diff @^!` produces any output at all. If you exclude both
> parents, why do I see a diff for parent 2 (I see the complete diff of
> the branch that was merged in)?

I think using ^! on merge commits with diff ought to be an error,
personally.  Not that I'd want to get into the battle of figuring out
if someone did figure out what it means and find a valid use for it to
determine if we need to deprecate or provide alternate syntax or who
knows what for it.  I know it can be explained to be a combined diff
of some sort, but that's it.  ^! as a post-fix operator on a non-merge
commit makes sense to me and is useful, even though it's an ugly hack
and hideous usability-wise from a teaching or explaining perspective.
Without digging into the code in detail, I wouldn't for the life of me
be able to explain how ^! will work on merge commits with diff.

> Again, thank you for your example, you definitely made things very
> clear for me. I see where the confusion is. And I think --cc is a good
> way to get more context. At this point I'm just concerned about the
> @^! behavior with merge commits & diff.

I am too, though my suggestion for that case is: please, for all that
is good in the world, DON'T USE THAT.  EVER.  Whatever it might do, I
certainly don't want to waste any brain cycles supporting it or making
sure it still works.  diff is an endpoint operation and people should
pass endpoints to diff, not ranges (with only two exceptions I can
think of: '^!' on non-merge commits and usage of the super confusing
and ugly but also necessary '...' for diff against a merge-base.)

Honestly, I am tempted to make git throw a warning whenever folks use
a range operator with diff other than the exceptions noted above.
Hmm...

Elijah

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Merge commit diff results are confusing and inconsistent
  2019-05-07 14:10       ` Robert Dailey
                           ` (3 preceding siblings ...)
  2019-05-07 16:44         ` Elijah Newren
@ 2019-05-11 14:08         ` Philip Oakley
  4 siblings, 0 replies; 12+ messages in thread
From: Philip Oakley @ 2019-05-11 14:08 UTC (permalink / raw)
  To: Robert Dailey, Ævar Arnfjörð Bjarmason
  Cc: Eckhard Maaß, Git

Hi Robert,

On 07/05/2019 15:10, Robert Dailey wrote:
> The majority use case I'm interested in is seeing net-positive changes
> that happen in merge commits. Normally I take for granted that merge
> commits have nothing meaningful in them (meaningful here defined as
> something unexpected for a merge commit). But what if someone makes a
> poor decision and does some crazy refactoring in 1 file and amends it
> into a merge commit? Let's also say that these changes are done to a
> file that wasn't modified in any parent (say a unrelated.txt next to
> your color.txt). Since neither parent cares about that file for the
> purposes of the merge, I am trying to make sense of a revision
> specification that can be used to see what they did to that file.

I see that you are specifically interested in seeing 'net-positive' changes.

Part of the problem is that for a merge commit there are multiple 
choices as to the implied initial central merge, where A and B are 
combined to create X [which I just called the central merge], to which 
further changes are made to create the final merge commit C. (Note: X is 
never committed, and is somewhat 'mythical')

These cases where there needs to be 'further changes', either to resolve 
conflicts because we never got a cleanly merged X, or the user added 
changes, we an "Evil Commit/Merge". Definitions vary slightly between 
different protagonists in the VCS world as to the best evil merge 
resolution starategies.

For your 'net-positive' changes, what is needed is to effectively 
generate that mythical clean initial merge X where either we delete from 
both sides, or we have a simple addition only from one side 
(addition/deletion normally being of whole lines). It is only that way 
that allows the changes from X to C to be addition only.

Unfortunately there is currently no diff representation that does that, 
as there is no method of indicating that middle X state. In the worst 
case there are always pathological cases.

A similar problem exists for the “reuse recorded resolution” (rerere / 
redo) storage of conflict resolutions. At present there isn't a way of 
exchanging such resolutions in a mechanism similar to a diff. In fact I 
was only just asking about that [1]  within the last two days! There is 
some discussion about the rerere database in [2], should you want a look.
--
Philip


[1] rerere - 
https://public-inbox.org/git/b8e56556-6c83-9e37-38e9-ac67f51b5cd2@iee.org/
[2] 
https://github.com/git/git/blob/master/Documentation/technical/rerere.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-11 14:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 15:55 Merge commit diff results are confusing and inconsistent Robert Dailey
2019-05-03 19:12 ` Eckhard Maaß
2019-05-06 15:38   ` Robert Dailey
2019-05-06 16:52     ` Eckhard Maaß
2019-05-06 23:52     ` Ævar Arnfjörð Bjarmason
2019-05-07 14:10       ` Robert Dailey
2019-05-07 14:41         ` Robert Dailey
2019-05-07 14:58         ` Denton Liu
2019-05-07 15:55           ` Eckhard Maaß
2019-05-07 15:26         ` Ævar Arnfjörð Bjarmason
2019-05-07 16:44         ` Elijah Newren
2019-05-11 14:08         ` Philip Oakley

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).