git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* rebase - "fixup!" conflict applied at the wrong code location, why ?
@ 2020-07-08 23:03 Philippe Blain
  2020-07-08 23:07 ` Philippe Blain
  2020-07-09  4:07 ` Elijah Newren
  0 siblings, 2 replies; 5+ messages in thread
From: Philippe Blain @ 2020-07-08 23:03 UTC (permalink / raw)
  To: git

Hello,

I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
when I noticed that I had forgotten to add something to a previous commit.
Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
first 'fixup!' commit with a conflict. However, the conflict is not located at the right place 
in the code (it's not in the right subroutine!). This is very surprising to me, and I would 
like to understand why it happens.

Steps to reproduce:

git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
cd cice
git rebase -i --autosquash my-first-commit
# save the todo list without modifications
  Auto-merging <file> 
  CONFLICT (content): Merge conflict in <file>
  error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
# the rebase stops at f4e1ae6
git diff
# tangential question : for some reason the hunk header does not appear here, I don't know why... 
git diff -2  # but it appears here
git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
# ok, the conflict appears in subroutine 'picard_solver'
git show REBASE_HEAD -U5
# but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!

I would have expected that the conflict be created around lines 1118-1132 
(line numbers in f4e1ae6), in the 'anderson_solver' subroutine.

I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
is the commit where the 'anderson_solver' subroutine is added to the code...

Thanks,

Philippe.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rebase - "fixup!" conflict applied at the wrong code location, why ?
  2020-07-08 23:03 rebase - "fixup!" conflict applied at the wrong code location, why ? Philippe Blain
@ 2020-07-08 23:07 ` Philippe Blain
  2020-07-09  4:07 ` Elijah Newren
  1 sibling, 0 replies; 5+ messages in thread
From: Philippe Blain @ 2020-07-08 23:07 UTC (permalink / raw)
  To: git


> Le 8 juil. 2020 à 19:03, Philippe Blain <levraiphilippeblain@gmail.com> a écrit :
> 
> Steps to reproduce:

Forgot to mention that all of this was tested with both Git 2.27.0 and 2.27.0.22.g20514004dd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rebase - "fixup!" conflict applied at the wrong code location, why ?
  2020-07-08 23:03 rebase - "fixup!" conflict applied at the wrong code location, why ? Philippe Blain
  2020-07-08 23:07 ` Philippe Blain
@ 2020-07-09  4:07 ` Elijah Newren
  2020-07-14  2:42   ` Philippe Blain
  1 sibling, 1 reply; 5+ messages in thread
From: Elijah Newren @ 2020-07-09  4:07 UTC (permalink / raw)
  To: Philippe Blain; +Cc: git

On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain
<levraiphilippeblain@gmail.com> wrote:
>
> Hello,
>
> I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
> when I noticed that I had forgotten to add something to a previous commit.
> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place
> in the code (it's not in the right subroutine!). This is very surprising to me, and I would
> like to understand why it happens.
>
> Steps to reproduce:
>
> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
> cd cice
> git rebase -i --autosquash my-first-commit
> # save the todo list without modifications
>   Auto-merging <file>
>   CONFLICT (content): Merge conflict in <file>
>   error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
> # the rebase stops at f4e1ae6
> git diff
> # tangential question : for some reason the hunk header does not appear here, I don't know why...
> git diff -2  # but it appears here
> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
> # ok, the conflict appears in subroutine 'picard_solver'
> git show REBASE_HEAD -U5
> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!
>
> I would have expected that the conflict be created around lines 1118-1132
> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine.
>
> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
> is the commit where the 'anderson_solver' subroutine is added to the code...
>
> Thanks,
>
> Philippe.

If you take a look where the rebase stops, you see:

$ git ls-files -u
100644 ee4377f1ec6836fa05573976a473373906c37d9f 1
cicecore/cicedynB/dynamics/ice_dyn_vp.F90
100644 30c699ac371c2a751052fa98d04317e84a96ec47 2
cicecore/cicedynB/dynamics/ice_dyn_vp.F90
100644 276f224e9048fe0bbd7c25822695049547362c87 3
cicecore/cicedynB/dynamics/ice_dyn_vp.F90

The difference from the merge base to "other" (index 3) is pretty
tiny, you just moved one line in the "anderson_solver" subroutine
about 10 lines down.  The more interesting difference is from the
merge base to "ours" (index 2), seen with:

$ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
:2:cicecore/cicedynB/dynamics/ice_dyn_vp.F90

If you search for "anderson_solver" in that diff, you see that the
reconstructed diff looks like "subroutine anderson_solver" is renamed
to "subroutine picard_solver" with similar but slightly different
arguments.  The diff routine successfully finds lots of lines common
between these two subroutines so it seems quite logical as a way of
representing the changes, especially since they occur near each other
in the order of the file.  If you keep looking further in the diff, it
says that a few hundred lines later there's another subroutine rename,
this time from "subroutine calc_zeta_Pr" to "subroutine
anderson_solver".  Looking at it this way, it's quite natural to apply
the change from the rebased commit to the picard_solver since the
other side of history "renamed" anderson_solver to picard_solver.

This might not make as much sense to you, since you tend to think in
terms of parse trees and "subroutines" are a high level unit.  But
diffs work in terms of lines and have no knowledge of any kind of file
structure or semantics.  If you have a string of functions A,B,C and
insert a new function Z on one side of history somewhere in a file
that happens to look a lot like existing functions (meaning several
identical lines between the two), then diffs won't necessarily treat
it as a contiguous block of inserted lines but instead compare
function Z on one side to function B on the other, then function B on
the first side to function C on the second, etc.

There are alternative diff algorithms that try to minimize the number
of changed lines shown by the diff, such as --histogram and
--patience, and they do shrink this particular diff, but they both
yield the same treatment of comparing "subroutine anderson_solver" to
"subroutine picard_solver" essentially treating it as a rename.  To
get a merge like you want, you'd need some kind of higher level
semantic merge, or at least need functions to not have so many lines
in common between them.

This kind of problem also causes big issues when reordering functions
in a file.  line-by-line diffs and the diff3 merge algorithm tend to
really struggle with those.  (See e.g.
https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf if you want
to read up more on these.)

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rebase - "fixup!" conflict applied at the wrong code location, why ?
  2020-07-09  4:07 ` Elijah Newren
@ 2020-07-14  2:42   ` Philippe Blain
  2020-07-14  5:45     ` Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Blain @ 2020-07-14  2:42 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

Hi Elijah,

> Le 9 juil. 2020 à 00:07, Elijah Newren <newren@gmail.com> a écrit :
> 
> On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain
> <levraiphilippeblain@gmail.com> wrote:
>> 
>> Hello,
>> 
>> I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
>> when I noticed that I had forgotten to add something to a previous commit.
>> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
>> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place
>> in the code (it's not in the right subroutine!). This is very surprising to me, and I would
>> like to understand why it happens.
>> 
>> Steps to reproduce:
>> 
>> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
>> cd cice
>> git rebase -i --autosquash my-first-commit
>> # save the todo list without modifications
>>  Auto-merging <file>
>>  CONFLICT (content): Merge conflict in <file>
>>  error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
>> # the rebase stops at f4e1ae6
>> git diff
>> # tangential question : for some reason the hunk header does not appear here, I don't know why...
>> git diff -2  # but it appears here
>> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
>> # ok, the conflict appears in subroutine 'picard_solver'
>> git show REBASE_HEAD -U5
>> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!
>> 
>> I would have expected that the conflict be created around lines 1118-1132
>> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine.
>> 
>> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
>> is the commit where the 'anderson_solver' subroutine is added to the code...
>> 
>> Thanks,
>> 
>> Philippe.
> 
> If you take a look where the rebase stops, you see:

First, thanks a lot for your answer. I have a few questions.

> 
> $ git ls-files -u
> 100644 ee4377f1ec6836fa05573976a473373906c37d9f 1
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 100644 30c699ac371c2a751052fa98d04317e84a96ec47 2
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 100644 276f224e9048fe0bbd7c25822695049547362c87 3
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 
> The difference from the merge base to "other" (index 3) is pretty
> tiny, you just moved one line in the "anderson_solver" subroutine
> about 10 lines down.  

Yes, the output from 
$ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 :3:cicecore/cicedynB/dynamics/ice_dyn_vp.F90

seems to be the same as the one from
$ git show REBASE_HEAD

This is a little confusing to me, in the sense that I don't understand why 
the merge-base is what it is. At this point, I do

$ git merge-base HEAD REBASE_HEAD 
f4e1ae67b7d6ca36c6f3ea7c9da43d81caf24067

Ok, so 'git merge-base' finds that the merge-base between HEAD and
REBASE_HEAD is HEAD; this makes sense to me (no previous commits
have been rewritten so far, so REBASE_HEAD is directly ahead of HEAD).
But, if I try to find the commit that contains
the blob ee4377f1ec6836fa05573976a473373906c37d9f (index 1), I find
REBASE_HEAD's parent:

$ git log --all --find-object=ee4377f1ec6836fa05573976a473373906c37d9f --format='commmit %H%ntree %T%nparent %P%n%n    %s%n'
commmit e8bfa557d3c81b75116d6557784b0439b792a308
tree f6fecb8193c3b877f22bcb8f4d8d2c203e17f06f
parent 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61

    fixup! Add Anderson acceleration for Picard iteration

commmit 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
tree 11fd096851015c0c16b793d9bbb5db039776483b
parent 63d4c73c1dd973f620307833bd363a1d5069d090

    ice_dyn_vp: introduce 'CICE_USE_LAPACK' preprocessor macro

$ blob=ee4377f1ec6836fa05573976a473373906c37d9f
$ parent=7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
$ git ls-tree -r $parent^{tree} | grep $blob
100644 blob ee4377f1ec6836fa05573976a473373906c37d9f	cicecore/cicedynB/dynamics/ice_dyn_vp.F90

I don't understand why at this point of the rebase, Git 
determines that the merge-base between HEAD and 
REBASE_HEAD is REBASE_HEAD's parent... this commit
is not even an ancestor of HEAD (or maybe I don't understand
what the "merge-base" is in this context?)...

> The more interesting difference is from the
> merge base to "ours" (index 2), seen with:
> 
> $ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> :2:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 
> If you search for "anderson_solver" in that diff, you see that the
> reconstructed diff looks like "subroutine anderson_solver" is renamed
> to "subroutine picard_solver" with similar but slightly different
> arguments.  The diff routine successfully finds lots of lines common
> between these two subroutines so it seems quite logical as a way of
> representing the changes, especially since they occur near each other
> in the order of the file.  If you keep looking further in the diff, it
> says that a few hundred lines later there's another subroutine rename,
> this time from "subroutine calc_zeta_Pr" to "subroutine
> anderson_solver".  Looking at it this way, it's quite natural to apply
> the change from the rebased commit to the picard_solver since the
> other side of history "renamed" anderson_solver to picard_solver.
> 
> This might not make as much sense to you, since you tend to think in
> terms of parse trees and "subroutines" are a high level unit.  But
> diffs work in terms of lines and have no knowledge of any kind of file
> structure or semantics.  If you have a string of functions A,B,C and
> insert a new function Z on one side of history somewhere in a file
> that happens to look a lot like existing functions (meaning several
> identical lines between the two), then diffs won't necessarily treat
> it as a contiguous block of inserted lines but instead compare
> function Z on one side to function B on the other, then function B on
> the first side to function C on the second, etc.
> 
> There are alternative diff algorithms that try to minimize the number
> of changed lines shown by the diff, such as --histogram and
> --patience, and they do shrink this particular diff, but they both
> yield the same treatment of comparing "subroutine anderson_solver" to
> "subroutine picard_solver" essentially treating it as a rename.  To
> get a merge like you want, you'd need some kind of higher level
> semantic merge, or at least need functions to not have so many lines
> in common between them.

Thanks for these explanations. The behaviour makes sense.

> 
> This kind of problem also causes big issues when reordering functions
> in a file.  line-by-line diffs and the diff3 merge algorithm tend to
> really struggle with those.  (See e.g.
> https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf if you want
> to read up more on these.)

Thanks for the pointer! 

> 
> Hope that helps,
> Elijah

Thanks,
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rebase - "fixup!" conflict applied at the wrong code location, why ?
  2020-07-14  2:42   ` Philippe Blain
@ 2020-07-14  5:45     ` Elijah Newren
  0 siblings, 0 replies; 5+ messages in thread
From: Elijah Newren @ 2020-07-14  5:45 UTC (permalink / raw)
  To: Philippe Blain; +Cc: git

Hi Philippe,

On Mon, Jul 13, 2020 at 8:42 PM Philippe Blain
<levraiphilippeblain@gmail.com> wrote:
>
> Hi Elijah,
>
> > Le 9 juil. 2020 à 00:07, Elijah Newren <newren@gmail.com> a écrit :
> >
> > On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain
> > <levraiphilippeblain@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
> >> when I noticed that I had forgotten to add something to a previous commit.
> >> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
> >> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place
> >> in the code (it's not in the right subroutine!). This is very surprising to me, and I would
> >> like to understand why it happens.
> >>
> >> Steps to reproduce:
> >>
> >> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
> >> cd cice
> >> git rebase -i --autosquash my-first-commit
> >> # save the todo list without modifications
> >>  Auto-merging <file>
> >>  CONFLICT (content): Merge conflict in <file>
> >>  error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
> >> # the rebase stops at f4e1ae6
> >> git diff
> >> # tangential question : for some reason the hunk header does not appear here, I don't know why...
> >> git diff -2  # but it appears here
> >> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
> >> # ok, the conflict appears in subroutine 'picard_solver'
> >> git show REBASE_HEAD -U5
> >> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!
> >>
> >> I would have expected that the conflict be created around lines 1118-1132
> >> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine.
> >>
> >> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
> >> is the commit where the 'anderson_solver' subroutine is added to the code...
> >>
> >> Thanks,
> >>
> >> Philippe.
> >
> > If you take a look where the rebase stops, you see:
>
> First, thanks a lot for your answer. I have a few questions.
>
> >
> > $ git ls-files -u
> > 100644 ee4377f1ec6836fa05573976a473373906c37d9f 1
> > cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> > 100644 30c699ac371c2a751052fa98d04317e84a96ec47 2
> > cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> > 100644 276f224e9048fe0bbd7c25822695049547362c87 3
> > cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> >
> > The difference from the merge base to "other" (index 3) is pretty
> > tiny, you just moved one line in the "anderson_solver" subroutine
> > about 10 lines down.
>
> Yes, the output from
> $ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 :3:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
>
> seems to be the same as the one from
> $ git show REBASE_HEAD
>
> This is a little confusing to me, in the sense that I don't understand why
> the merge-base is what it is. At this point, I do
>
> $ git merge-base HEAD REBASE_HEAD
> f4e1ae67b7d6ca36c6f3ea7c9da43d81caf24067
>
> Ok, so 'git merge-base' finds that the merge-base between HEAD and
> REBASE_HEAD is HEAD; this makes sense to me (no previous commits
> have been rewritten so far, so REBASE_HEAD is directly ahead of HEAD).
> But, if I try to find the commit that contains
> the blob ee4377f1ec6836fa05573976a473373906c37d9f (index 1), I find
> REBASE_HEAD's parent:
>
> $ git log --all --find-object=ee4377f1ec6836fa05573976a473373906c37d9f --format='commmit %H%ntree %T%nparent %P%n%n    %s%n'
> commmit e8bfa557d3c81b75116d6557784b0439b792a308
> tree f6fecb8193c3b877f22bcb8f4d8d2c203e17f06f
> parent 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
>
>     fixup! Add Anderson acceleration for Picard iteration
>
> commmit 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
> tree 11fd096851015c0c16b793d9bbb5db039776483b
> parent 63d4c73c1dd973f620307833bd363a1d5069d090
>
>     ice_dyn_vp: introduce 'CICE_USE_LAPACK' preprocessor macro
>
> $ blob=ee4377f1ec6836fa05573976a473373906c37d9f
> $ parent=7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
> $ git ls-tree -r $parent^{tree} | grep $blob
> 100644 blob ee4377f1ec6836fa05573976a473373906c37d9f    cicecore/cicedynB/dynamics/ice_dyn_vp.F90
>
> I don't understand why at this point of the rebase, Git
> determines that the merge-base between HEAD and
> REBASE_HEAD is REBASE_HEAD's parent... this commit
> is not even an ancestor of HEAD (or maybe I don't understand
> what the "merge-base" is in this context?)...

First, remember that this is a rebase or cherry-pick, and not a merge
(we are not creating a new merge commit, but just a commit that
"cherry-picks" the changes from one commit).  But since the wording
and the machinery comes from a merge, let's start by discussing how
merge works.

When you merge two branches (e.g. 'git merge other' will merge branch
'other' into HEAD), git determines the merge-base, i.e. the point of
common history between two commits.  For simplicity, let's assume
there is a common point of history and exactly one common point of
history for these two commits.  Given these assumptions, we have a
merge-base and two commits.  Let's label the merge-base as A, the
commit at HEAD 'B', and the commit at 'other', 'C'.  So we are doing a
three-way merge between A, B, and C.

The thing about a three-way merge is besides thinking of it as a
three-way merge of A, B, and C, there are two other equivalent ways to
look at it: applying the changes between A and B to C, or applying the
changes between A and C to B.  These alternate ways of looking at
things come in handy when looking at rebases and cherry-picks.

When cherry-picking or rebasing a commit, we do NOT want to use the
normal merge-base of HEAD and the commit being picked; if we did that,
it would merge ALL the changes in the history of the commit being
picked into the current branch instead of just the changes from the
one commit we want to cherry-pick/rebase.  Noting that we only want
the changes from that one commit, we can word it similarly to above by
stating that we want to apply the changes between REBASE_HEAD^1 and
REBASE_HEAD to HEAD.  Looking at the above wording for three-way
merges ("applying the changes between A and C to B"), that means we
want to do a three-way merge of REBASE_HEAD^1, HEAD, and REBASE_HEAD.
And when we're done and create a new commit, we do not make a merge
commit but instead record HEAD as the only parent.  Hopefully, this
makes it clear why we use the parent of REBASE_HEAD as the
"merge-base" in the merge machinery; it's not really a "merge-base"
because we're not doing a merge, but we're using the merge machinery
with a special commit as the merge-base because it gives the results
we want.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-14  5:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 23:03 rebase - "fixup!" conflict applied at the wrong code location, why ? Philippe Blain
2020-07-08 23:07 ` Philippe Blain
2020-07-09  4:07 ` Elijah Newren
2020-07-14  2:42   ` Philippe Blain
2020-07-14  5:45     ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).