git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-rebase produces incorrect output
@ 2019-11-29  8:21 Pavel Roskin
  2019-11-29 13:31 ` Philip Oakley
  2019-11-30  4:22 ` Elijah Newren
  0 siblings, 2 replies; 6+ messages in thread
From: Pavel Roskin @ 2019-11-29  8:21 UTC (permalink / raw)
  To: git

Hi!

I've discovered an issue with "git rebase" producing a subtly
incorrect file. In fact, that files even compiled but failed in unit
tests! That's so scary that I'm going to stop using "git rebase" for
now. Fortunately, "git rebase --merge" is working correctly, so I'll
use it. Too bad there is no option to use "--merge" by default.

The issue was observed in git 2.23 and reproduced in today's next
branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64.

I've created a repository that demonstrates the issue:
https://github.com/proski/git-rebase-demo

The branch names should be self-explanatory. "master" is the base,
"branch1" and "branch2" contain one change each. If "branch1" is
rebased on top of "branch2", the result is incorrect, saved in the
"rebase-bad" branch. If "git rebase -m" is used, the result is
correct, saved in the "merge-good" branch.

The files in "rebase-bad" and "merge-good" have exactly the same lines
but in a different order. Yet the changes on branch1 and branch2
affect non-overlapping parts of the file. There should be no doubt how
the merged code should look like.

I believe the change on branch2 shifts the lines, so that the first
change from branch1 is applies to a place below the intended location,
and then git goes back to an earlier line to apply the next hunk. I
can imagine that it would do the right thing in case of swapped blocks
of code. Yet I have a real life example where it does a very wrong
thing.

Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff
origin/branch2 origin/merge-good" both produce diffs of 9957 bytes
long, different only in the order of the hunks.

Another interesting data point - "git rebase --interactive" is working
correctly.

-- 
Regards,
Pavel Roskin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-rebase produces incorrect output
  2019-11-29  8:21 git-rebase produces incorrect output Pavel Roskin
@ 2019-11-29 13:31 ` Philip Oakley
  2019-11-29 21:28   ` Pavel Roskin
  2019-11-30  4:22 ` Elijah Newren
  1 sibling, 1 reply; 6+ messages in thread
From: Philip Oakley @ 2019-11-29 13:31 UTC (permalink / raw)
  To: Pavel Roskin, git

On 29/11/2019 08:21, Pavel Roskin wrote:
> Hi!
>
> I've discovered an issue with "git rebase" producing a subtly
> incorrect file. In fact, that files even compiled but failed in unit
> tests! That's so scary that I'm going to stop using "git rebase" for
> now. Fortunately, "git rebase --merge" is working correctly, so I'll
> use it. Too bad there is no option to use "--merge" by default.
>
> The issue was observed in git 2.23 and reproduced in today's next
> branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64.
>
> I've created a repository that demonstrates the issue:
> https://github.com/proski/git-rebase-demo
>
> The branch names should be self-explanatory. "master" is the base,
> "branch1" and "branch2" contain one change each. If "branch1" is
> rebased on top of "branch2", the result is incorrect, saved in the
> "rebase-bad" branch. If "git rebase -m" is used, the result is
> correct, saved in the "merge-good" branch.
>
> The files in "rebase-bad" and "merge-good" have exactly the same lines
> but in a different order. Yet the changes on branch1 and branch2
> affect non-overlapping parts of the file. There should be no doubt how
> the merged code should look like.
>
> I believe the change on branch2 shifts the lines, so that the first
> change from branch1 is applies to a place below the intended location,
> and then git goes back to an earlier line to apply the next hunk. I
> can imagine that it would do the right thing in case of swapped blocks
> of code. Yet I have a real life example where it does a very wrong
> thing.
>
> Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff
> origin/branch2 origin/merge-good" both produce diffs of 9957 bytes
> long, different only in the order of the hunks.
>
> Another interesting data point - "git rebase --interactive" is working
> correctly.
>
Which specific lines is this on?

Using the Github compare facility [1], I see multiple changes, some of 
which are probably just noise from the example.
  https://github.com/proski/git-rebase-demo/compare/merge-good...rebase-bad

Philip

[1] 
https://help.github.com/en/github/committing-changes-to-your-project/comparing-commits-across-time

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-rebase produces incorrect output
  2019-11-29 13:31 ` Philip Oakley
@ 2019-11-29 21:28   ` Pavel Roskin
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Roskin @ 2019-11-29 21:28 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git

Hello Philip,

I just found out that the true diff on Github should use two dots, not three:
https://github.com/proski/git-rebase-demo/compare/merge-good..rebase-bad

The first difference is on line 187 in
getPullRequests_throws_on_not_found(). The correct change should have
"nullValue" (see the diff between "master" and "branch1"), but
"rebase-bad" has "SocketException" there instead.

[Sorry for the private email that I sent by mistake]

Pavel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-rebase produces incorrect output
  2019-11-29  8:21 git-rebase produces incorrect output Pavel Roskin
  2019-11-29 13:31 ` Philip Oakley
@ 2019-11-30  4:22 ` Elijah Newren
  2019-11-30 16:37   ` Elijah Newren
  2019-11-30 17:58   ` Junio C Hamano
  1 sibling, 2 replies; 6+ messages in thread
From: Elijah Newren @ 2019-11-30  4:22 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: Git Mailing List

On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@gmail.com> wrote:
>
> Hi!
>
> I've discovered an issue with "git rebase" producing a subtly
> incorrect file. In fact, that files even compiled but failed in unit
> tests! That's so scary that I'm going to stop using "git rebase" for
> now. Fortunately, "git rebase --merge" is working correctly, so I'll
> use it. Too bad there is no option to use "--merge" by default.

Indeed.  We really should fix that, if not just make it the default
for everyone.

> The issue was observed in git 2.23 and reproduced in today's next
> branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64.
>
> I've created a repository that demonstrates the issue:
> https://github.com/proski/git-rebase-demo
>
> The branch names should be self-explanatory. "master" is the base,
> "branch1" and "branch2" contain one change each. If "branch1" is
> rebased on top of "branch2", the result is incorrect, saved in the
> "rebase-bad" branch. If "git rebase -m" is used, the result is
> correct, saved in the "merge-good" branch.
>
> The files in "rebase-bad" and "merge-good" have exactly the same lines
> but in a different order. Yet the changes on branch1 and branch2
> affect non-overlapping parts of the file. There should be no doubt how
> the merged code should look like.
>
> I believe the change on branch2 shifts the lines, so that the first
> change from branch1 is applies to a place below the intended location,
> and then git goes back to an earlier line to apply the next hunk. I
> can imagine that it would do the right thing in case of swapped blocks
> of code. Yet I have a real life example where it does a very wrong
> thing.
>
> Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff
> origin/branch2 origin/merge-good" both produce diffs of 9957 bytes
> long, different only in the order of the hunks.
>
> Another interesting data point - "git rebase --interactive" is working
> correctly.

Thanks for the detailed report and simple testcase.  Turns out the
--interactive isn't so interesting, because a few cycles back we
re-implemented the --merge behavior on top of the interactive
machinery so the two use the exact same engine.  Anyway, I can
duplicate the problem and noticed a few interesting things.  Since the
am-backend for rebase (the default) basically just uses diff and
apply, I tried duplicating with just those after looking at things and
noticing that it appeared to be applying patch hunks on the wrong
lines:

$ git switch branch2

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U3 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 43 insertions(+), 43 deletions(-)
$ git diff --shortstat origin/rebase-bad

So, this reproduces your bad results.  Let's repeat with -U4:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U4 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 10 insertions(+), 10 deletions(-)
$ git diff --shortstat origin/rebase-bad
 1 file changed, 37 insertions(+), 37 deletions(-)

That gives us a result that matches neither merge-good nor rebase-bad,
but is closer to the good side.  Let's try again with -U5:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U5 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
$ git diff --shortstat origin/rebase-bad
 1 file changed, 43 insertions(+), 43 deletions(-)

Ahah!  With five lines of context, git diff & git apply can produce
the correct result.

Sadly, I tried to force this with git rebase, but -C5 only affected
the apply side and there's no option to pass to rebase to pass through
-U5 to the diff logic.  Also, although there is a diff.context config
option, git-am ignores it (Note that git_am_config() does not directly
check that value and it calls git_default_config(), not
git_diff_ui_config() or even git_diff_basic_config()).  So, it's not
possible to force the am-based rebase to get the right answer
currently even if you figure out what the problem is.

The merge-based rebase, by contrast, essentially benfits from having
the entire files of each version accessible so it automatically gets
it right.


So, to summarize here:
  * you have a case where the default 3 lines of context mess stuff
up; but rebase --merge works great
  * am doesn't have a -U option, and ignores the diff.context setting,
making it impossible to force the am backend to work on your case
and also:
  * rebase doesn't have an option to use the merge/interactive backend
by default (nor an --am option to override it)
Also,
  * The performance of the merge/interactive backend is slightly
better than the am-backend
(https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/)
and will be getting better
  * The merge/interactive backend supports many more options than the
am-backend, though the am one still has a few the merge backend
doesn't.  Once the ra/rebase-i-more-options topic merges, --whitespace
will be the only consequential option that the am-backend supports
that the merge/interactive-backend doesn't.  (There's also -C, but as
noted above, the merge/interactive backend already have access to the
full file).

Maybe we should just switch the default, for everyone?  (And provide
an --am option to override it and a config setting to get the old
default?)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-rebase produces incorrect output
  2019-11-30  4:22 ` Elijah Newren
@ 2019-11-30 16:37   ` Elijah Newren
  2019-11-30 17:58   ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Elijah Newren @ 2019-11-30 16:37 UTC (permalink / raw)
  To: Pavel Roskin
  Cc: Git Mailing List, Phillip Wood, Johannes Schindelin,
	Junio C Hamano

On Fri, Nov 29, 2019 at 8:22 PM Elijah Newren <newren@gmail.com> wrote:
> On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@gmail.com> wrote:

> So, to summarize here:
>   * you have a case where the default 3 lines of context mess stuff
> up; but rebase --merge works great
>   * am doesn't have a -U option, and ignores the diff.context setting,
> making it impossible to force the am backend to work on your case
>   * rebase doesn't have an option to use the merge/interactive backend
> by default (nor an --am option to override it)
>
> Also:
>   * The performance of the merge/interactive backend is slightly
> better than the am-backend
> (https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/)
> and will continue to get better
>   * The merge/interactive backend supports many more options than the
> am-backend, though the am one still has a few the merge backend
> doesn't.  Once the ra/rebase-i-more-options topic merges, --whitespace
> will be the only consequential option that the am-backend supports
> that the merge/interactive-backend doesn't.  (There's also -C, but as
> noted above, the merge/interactive backend already have access to the
> full file).

In case it wasn't clear above, the merge/interactive backend could
just accept the -C option and ignore it and do nothing, since it
already has access to the full file (thus why I consider the -C option
to not be consequential).

Also, I remembered and dug out a few more items about the default
rebase backend that might be worth including in this summary:
  * The am backend operates with incomplete tree information as well,
limiting what the merge/resolve/whatever can do and what information
can be provided to the user (see
https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/)
  * The interactive backend, although slightly faster than the
am-backend (on p3400 at least), is slightly slower with split-index
which hasn't yet been investigated (see
https://public-inbox.org/git/nycvar.QRO.7.76.6.1901312310280.41@tvgsbejvaqbjf.bet/)

> Maybe we should just switch the default, for everyone?  (And provide
> an --am option to override it and a config setting to get the old
> default?)

CC'ing a few folks for opinions on switching the default backend of
rebase from --am to --merge.  Johannes already agreed it was the right
path eventually[1], and Junio suggested the am backend should be
deprecated[2] and eventually replaced, so I was going to push on this
after some merge performance work but perhaps it's a good time to
query if it's time to switch the default sooner.

[1] See the end of
https://public-inbox.org/git/nycvar.QRO.7.76.6.1808311158540.71@tvgsbejvaqbjf.bet/,
also linked above.
[2] https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/,
also linked above.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-rebase produces incorrect output
  2019-11-30  4:22 ` Elijah Newren
  2019-11-30 16:37   ` Elijah Newren
@ 2019-11-30 17:58   ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2019-11-30 17:58 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Pavel Roskin, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Sadly, I tried to force this with git rebase, but -C5 only affected
> the apply side and there's no option to pass to rebase to pass through
> -U5 to the diff logic.  Also, although there is a diff.context config
> option, git-am ignores it (Note that git_am_config() does not directly
> check that value and it calls git_default_config(), not
> git_diff_ui_config() or even git_diff_basic_config()).

Not essential but puzzled.  The context applies to the generation
side, not acceptance side, no?  IOW, I suspect that you are talking
about "git format-patch" that sits on the upstream side of the pipe
that feeds "git am".

> So, to summarize here:
>   * you have a case where the default 3 lines of context mess stuff
> up; but rebase --merge works great
>   * am doesn't have a -U option, and ignores the diff.context setting,
> making it impossible to force the am backend to work on your case
> and also:

I do not think it is super hard to teach "git rebase" to pass
backend specific options so that "git rebase--am" can be told to
work with wider context (which will reduce the risk of ambiguous
patch like this example, trading the increased risk of unnecessary
conflicts; it is a good trade-off most of the time for added safety,
as nobody wants a system that produces a wrong result silently and
quickly).

Having said that,

>   * rebase doesn't have an option to use the merge/interactive backend
> by default (nor an --am option to override it)

I think addition of rebase.backend would be a good first step for
eventually flipping the default, which by the way I have no trouble
with.

> Maybe we should just switch the default, for everyone?  (And provide
> an --am option to override it and a config setting to get the old
> default?)

Yes, that would be a sensible second step.  I actually think a
longer term goal is to deprecate the am backend.  It was invented
first and then kept to be the default backend for a long time
because the merge based backend historically has been noticeably
slow (it was expected to be---it was essentially a shell script that
run cherry-pick repeatedly in a loop).  In some future, it would
outlive its usefulness, and that I think that that future is just
around the corner.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-30 17:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29  8:21 git-rebase produces incorrect output Pavel Roskin
2019-11-29 13:31 ` Philip Oakley
2019-11-29 21:28   ` Pavel Roskin
2019-11-30  4:22 ` Elijah Newren
2019-11-30 16:37   ` Elijah Newren
2019-11-30 17:58   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).