git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Unreliable 'git rebase --onto'
@ 2020-01-08 21:43 Eugeniu Rosca
  2020-01-08 22:35 ` SZEDER Gábor
       [not found] ` <CABPp-BHsyMOz+hi7EYoAnAWfzms7FRfwqCoarnu8H+vyDoN6SQ@mail.gmail.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Eugeniu Rosca @ 2020-01-08 21:43 UTC (permalink / raw)
  To: gitster, peff, avarab, git; +Cc: Eugeniu Rosca, Eugeniu Rosca

Hello Git community,

Below is a simple reproduction scenario for what looks to be a bug (?)
in 'git rebase --onto' (v2.25.0-rc1-19-g042ed3e048af).

I would appreciate your confirmation of the misbehavior.
If the behavior is correct/expected, I would appreciate some feedback
how to avoid it in future, since it occurs with the default parameters.

1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. ### Cherry pick an upstream commit, to contrast the results with
   'git rebase --onto':
   $ git checkout -b v4.18-cherry-pick v4.18
   $ git cherry-pick 463fa44eec2fef50
   Auto-merging drivers/input/touchscreen/atmel_mxt_ts.c
   warning: inexact rename detection was skipped due to too many files.
   warning: you may want to set your merge.renamelimit variable to at least 7216 and retry the command.
   [v4.18-cherry-pick bd142b45bf3a] Input: atmel_mxt_ts - disable IRQ across suspend
    Author: Evan Green <evgreen@chromium.org>
    Date: Wed Oct 2 14:00:21 2019 -0700
    1 file changed, 4 insertions(+)

3. ### In spite of the warning, the result matches the original commit:
   $ vimdiff <(git show 463fa44eec2fef50) <(git show v4.18-cherry-pick)

4. ### Now, backport the same commit via 'git rebase --onto'
   $ git rebase --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
   First, rewinding head to replay your work on top of it...
   Applying: Input: atmel_mxt_ts - disable IRQ across suspend

5. ### The result is different:
   $ git branch v4.18-rebase-onto
   $ git diff v4.18-cherry-pick v4.18-rebase-onto

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c b/drivers/input/touchscreen/atmel_mxt_ts.c
index b45958e89cc5..2345b587662b 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -3139,8 +3139,6 @@ static int __maybe_unused mxt_suspend(struct device *dev)
 
 	mutex_unlock(&input_dev->mutex);
 
-	disable_irq(data->irq);
-
 	return 0;
 }
 
@@ -3162,6 +3160,8 @@ static int __maybe_unused mxt_resume(struct device *dev)
 
 	mutex_unlock(&input_dev->mutex);
 
+	disable_irq(data->irq);
+
 	return 0;
 }


In a nutshell, purely from user's perspective:
 - I get a warning from 'git cherry pick', with perfect results
 - I get no warning from 'git rebase --onto', with wrong results

Does git still behave expectedly? TIA!

-- 
Best Regards,
Eugeniu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-08 21:43 Unreliable 'git rebase --onto' Eugeniu Rosca
@ 2020-01-08 22:35 ` SZEDER Gábor
  2020-01-09  0:55   ` Elijah Newren
  2020-01-09 11:13   ` Unreliable 'git rebase --onto' Eugeniu Rosca
       [not found] ` <CABPp-BHsyMOz+hi7EYoAnAWfzms7FRfwqCoarnu8H+vyDoN6SQ@mail.gmail.com>
  1 sibling, 2 replies; 22+ messages in thread
From: SZEDER Gábor @ 2020-01-08 22:35 UTC (permalink / raw)
  To: Eugeniu Rosca, Elijah Newren; +Cc: gitster, peff, avarab, git, Eugeniu Rosca

On Wed, Jan 08, 2020 at 10:43:49PM +0100, Eugeniu Rosca wrote:
> Hello Git community,
> 
> Below is a simple reproduction scenario for what looks to be a bug (?)
> in 'git rebase --onto' (v2.25.0-rc1-19-g042ed3e048af).
> 
> I would appreciate your confirmation of the misbehavior.
> If the behavior is correct/expected, I would appreciate some feedback
> how to avoid it in future, since it occurs with the default parameters.
> 
> 1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 
> 2. ### Cherry pick an upstream commit, to contrast the results with
>    'git rebase --onto':
>    $ git checkout -b v4.18-cherry-pick v4.18
>    $ git cherry-pick 463fa44eec2fef50
>    Auto-merging drivers/input/touchscreen/atmel_mxt_ts.c
>    warning: inexact rename detection was skipped due to too many files.
>    warning: you may want to set your merge.renamelimit variable to at least 7216 and retry the command.
>    [v4.18-cherry-pick bd142b45bf3a] Input: atmel_mxt_ts - disable IRQ across suspend
>     Author: Evan Green <evgreen@chromium.org>
>     Date: Wed Oct 2 14:00:21 2019 -0700
>     1 file changed, 4 insertions(+)
> 
> 3. ### In spite of the warning, the result matches the original commit:
>    $ vimdiff <(git show 463fa44eec2fef50) <(git show v4.18-cherry-pick)
> 
> 4. ### Now, backport the same commit via 'git rebase --onto'
>    $ git rebase --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>    First, rewinding head to replay your work on top of it...
>    Applying: Input: atmel_mxt_ts - disable IRQ across suspend
> 
> 5. ### The result is different:
>    $ git branch v4.18-rebase-onto
>    $ git diff v4.18-cherry-pick v4.18-rebase-onto
> 
> diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c b/drivers/input/touchscreen/atmel_mxt_ts.c
> index b45958e89cc5..2345b587662b 100644
> --- a/drivers/input/touchscreen/atmel_mxt_ts.c
> +++ b/drivers/input/touchscreen/atmel_mxt_ts.c
> @@ -3139,8 +3139,6 @@ static int __maybe_unused mxt_suspend(struct device *dev)
>  
>  	mutex_unlock(&input_dev->mutex);
>  
> -	disable_irq(data->irq);
> -
>  	return 0;
>  }
>  
> @@ -3162,6 +3160,8 @@ static int __maybe_unused mxt_resume(struct device *dev)
>  
>  	mutex_unlock(&input_dev->mutex);
>  
> +	disable_irq(data->irq);
> +
>  	return 0;
>  }
> 
> 
> In a nutshell, purely from user's perspective:
>  - I get a warning from 'git cherry pick', with perfect results
>  - I get no warning from 'git rebase --onto', with wrong results
> 
> Does git still behave expectedly? TIA!

This is a known issue with the 'am' backend of 'git rebase'.

The good news is that work is already well under way to change the
default backend from 'am' to 'merge', which will solve this issue.
From the log message of aa523de170 (rebase: change the default backend
from "am" to "merge", 2019-12-24):

  The am-backend drops information and thus limits what we can do:
  [...]
    * reduction in context from only having a few lines beyond those
      changed means that when context lines are non-unique we can apply
      patches incorrectly.[2]
  [...]
  [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+>

Alas, there is unexpected bad news: with that commit the runtime of
your 'git rebase --onto' command goes from <1sec to over 50secs.
Cc-ing Elijah, author of that patch...


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-08 22:35 ` SZEDER Gábor
@ 2020-01-09  0:55   ` Elijah Newren
  2020-01-09 15:03     ` SZEDER Gábor
  2020-01-09 11:13   ` Unreliable 'git rebase --onto' Eugeniu Rosca
  1 sibling, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2020-01-09  0:55 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Eugeniu Rosca, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

user 0m9.644s
sys 0m3.620s
On Wed, Jan 8, 2020 at 2:36 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Jan 08, 2020 at 10:43:49PM +0100, Eugeniu Rosca wrote:
> > Hello Git community,
> >
> > Below is a simple reproduction scenario for what looks to be a bug (?)
> > in 'git rebase --onto' (v2.25.0-rc1-19-g042ed3e048af).
> >
> > I would appreciate your confirmation of the misbehavior.
> > If the behavior is correct/expected, I would appreciate some feedback
> > how to avoid it in future, since it occurs with the default parameters.
> >
> > 1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >
> > 2. ### Cherry pick an upstream commit, to contrast the results with
> >    'git rebase --onto':
> >    $ git checkout -b v4.18-cherry-pick v4.18
> >    $ git cherry-pick 463fa44eec2fef50
> >    Auto-merging drivers/input/touchscreen/atmel_mxt_ts.c
> >    warning: inexact rename detection was skipped due to too many files.
> >    warning: you may want to set your merge.renamelimit variable to at least 7216 and retry the command.

Lots of renames...

> >    [v4.18-cherry-pick bd142b45bf3a] Input: atmel_mxt_ts - disable IRQ across suspend
> >     Author: Evan Green <evgreen@chromium.org>
> >     Date: Wed Oct 2 14:00:21 2019 -0700
> >     1 file changed, 4 insertions(+)
> >
> > 3. ### In spite of the warning, the result matches the original commit:
> >    $ vimdiff <(git show 463fa44eec2fef50) <(git show v4.18-cherry-pick)
> >
> > 4. ### Now, backport the same commit via 'git rebase --onto'
> >    $ git rebase --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
> >    First, rewinding head to replay your work on top of it...
> >    Applying: Input: atmel_mxt_ts - disable IRQ across suspend
> >
> > 5. ### The result is different:
> >    $ git branch v4.18-rebase-onto
> >    $ git diff v4.18-cherry-pick v4.18-rebase-onto
> >
> > diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c b/drivers/input/touchscreen/atmel_mxt_ts.c
> > index b45958e89cc5..2345b587662b 100644
> > --- a/drivers/input/touchscreen/atmel_mxt_ts.c
> > +++ b/drivers/input/touchscreen/atmel_mxt_ts.c
> > @@ -3139,8 +3139,6 @@ static int __maybe_unused mxt_suspend(struct device *dev)
> >
> >       mutex_unlock(&input_dev->mutex);
> >
> > -     disable_irq(data->irq);
> > -
> >       return 0;
> >  }
> >
> > @@ -3162,6 +3160,8 @@ static int __maybe_unused mxt_resume(struct device *dev)
> >
> >       mutex_unlock(&input_dev->mutex);
> >
> > +     disable_irq(data->irq);
> > +
> >       return 0;
> >  }
> >
> >
> > In a nutshell, purely from user's perspective:
> >  - I get a warning from 'git cherry pick', with perfect results
> >  - I get no warning from 'git rebase --onto', with wrong results
> >
> > Does git still behave expectedly? TIA!
>
> This is a known issue with the 'am' backend of 'git rebase'.
>
> The good news is that work is already well under way to change the
> default backend from 'am' to 'merge', which will solve this issue.
> From the log message of aa523de170 (rebase: change the default backend
> from "am" to "merge", 2019-12-24):
>
>   The am-backend drops information and thus limits what we can do:
>   [...]
>     * reduction in context from only having a few lines beyond those
>       changed means that when context lines are non-unique we can apply
>       patches incorrectly.[2]
>   [...]
>   [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+>
>
> Alas, there is unexpected bad news: with that commit the runtime of
> your 'git rebase --onto' command goes from <1sec to over 50secs.
> Cc-ing Elijah, author of that patch...

I see slowdown, but not nearly as big as you report:

$ git checkout -b v4.18-cherry-pick v4.18
$ time git cherry-pick 463fa44eec2fef50
Auto-merging drivers/input/touchscreen/atmel_mxt_ts.c
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at
least 7216 and retry the command.
[v4.18-cherry-pick 88d39cdf3e80] Input: atmel_mxt_ts - disable IRQ
across suspend
 Author: Evan Green <evgreen@chromium.org>
 Date: Wed Oct 2 14:00:21 2019 -0700
 1 file changed, 4 insertions(+)

real 0m1.110s
user 0m0.956s
sys 0m0.284s
$ time git rebase --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef5
First, rewinding head to replay your work on top of it...
Applying: Input: atmel_mxt_ts - disable IRQ across suspend

real 0m1.643s
user 0m1.296s
sys 0m0.264s
$ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at
least 7216 and retry the command.
Successfully rebased and updated detached HEAD.

real 0m13.305s
user 0m9.644s
sys 0m3.620s




Interestingly, turning off rename detection only speeds it up a little bit:
$ time git rebase -m -Xno-renames --onto v4.18 463fa44eec2fef50~
463fa44eec2fef50
Successfully rebased and updated detached HEAD.

real 0m11.955s
user 0m8.732s
sys 0m3.424s


This is an interesting testcase; I'm going to try to find some time to
dig in further.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
       [not found] ` <CABPp-BHsyMOz+hi7EYoAnAWfzms7FRfwqCoarnu8H+vyDoN6SQ@mail.gmail.com>
@ 2020-01-09 10:53   ` Eugeniu Rosca
  2020-01-09 18:05     ` Elijah Newren
  0 siblings, 1 reply; 22+ messages in thread
From: Eugeniu Rosca @ 2020-01-09 10:53 UTC (permalink / raw)
  To: Elijah Newren, SZEDER Gábor
  Cc: Eugeniu Rosca, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

Hi Elijah, hi Szeder,

On Wed, Jan 08, 2020 at 02:06:22PM -0800, Elijah Newren wrote:
> This looks like a known bug in rebase, in particular in the am-backend that
> rebase uses by default.  If I'm correct that it's just a context region
> issue, then this is the same bug that was recently discussed at
> https://lore.kernel.org/git/CAN_72e2h2avv-U9BVBYqXVKiC+5kHy-pjejyMSD3X22uRXE39g@mail.gmail.com/.
> The current plan is to switch the default over to the merge backend (the
> same machinery that cherry-pick uses), which doesn't suffer from the same
> shortcomings (you can see the current work being done in this area at
> https://lore.kernel.org/git/pull.679.v3.git.git.1577217299.gitgitgadget@gmail.com/
> ).

Thank you for your feedback and references, here and in [*].

Once hit by this or similar issues, I think there is high chance for
people to go through the same feelings as described by Pavel in [**]:

  ---
  That's so scary that I'm going to stop using "git rebase" for now.
  ---

Some years ago I was hit by 'git merge' producing slightly different
results compared to 'git rebase --onto' and 'git cherry-pick A..B'
(maybe I can come up with a reproduction scenario for that too).

Since then, I usually contrast the outcomes of merging, cherry-picking
and rebasing, to make sure they match, but that's painful and
time-consuming.

> In the mean time, you can pass the -m flag to rebase to avoid these types
> of problems.  In fact, if you could retry with -m you may be able to
> confirm whether it's the same issue.

Indeed, neither `git rebase -m` nor `git rebase -i` exhibit the problem.

[*] https://lore.kernel.org/git/CABPp-BHsy75UGm4wTOP2_AYik_dZi-_BxtAn-hyi-ZrNRRWGuw@mail.gmail.com/T/#m1cbf80ef56c260a626146d61291d7fbabd108f1b
[**] https://lore.kernel.org/git/CAN_72e2h2avv-U9BVBYqXVKiC+5kHy-pjejyMSD3X22uRXE39g@mail.gmail.com/

Thanks again.

-- 
Best Regards,
Eugeniu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-08 22:35 ` SZEDER Gábor
  2020-01-09  0:55   ` Elijah Newren
@ 2020-01-09 11:13   ` Eugeniu Rosca
  1 sibling, 0 replies; 22+ messages in thread
From: Eugeniu Rosca @ 2020-01-09 11:13 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Eugeniu Rosca, Elijah Newren, gitster, peff, avarab, git,
	Eugeniu Rosca

Hi Szeder,

On Wed, Jan 08, 2020 at 11:35:57PM +0100, SZEDER Gábor wrote:
> This is a known issue with the 'am' backend of 'git rebase'.
> 
> The good news is that work is already well under way to change the
> default backend from 'am' to 'merge', which will solve this issue.
> From the log message of aa523de170 (rebase: change the default backend
> from "am" to "merge", 2019-12-24):
> 
>   The am-backend drops information and thus limits what we can do:
>   [...]
>     * reduction in context from only having a few lines beyond those
>       changed means that when context lines are non-unique we can apply
>       patches incorrectly.[2]
>   [...]
>   [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+>
> 
> Alas, there is unexpected bad news: with that commit the runtime of
> your 'git rebase --onto' command goes from <1sec to over 50secs.
> Cc-ing Elijah, author of that patch...

[$.02] I would personally take the route of regaining users' trust in
'git rebase' first, with fixing the performance penalty later on.

I was quite impressed by the recent 2.24.0 performance improvements,
which tells there might be room for improvement for `git rebase` too,
once it is fixed.

-- 
Best Regards,
Eugeniu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-09  0:55   ` Elijah Newren
@ 2020-01-09 15:03     ` SZEDER Gábor
  2020-01-09 17:53       ` Elijah Newren
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
  0 siblings, 2 replies; 22+ messages in thread
From: SZEDER Gábor @ 2020-01-09 15:03 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Eugeniu Rosca, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

On Wed, Jan 08, 2020 at 04:55:46PM -0800, Elijah Newren wrote:
> > Alas, there is unexpected bad news: with that commit the runtime of
> > your 'git rebase --onto' command goes from <1sec to over 50secs.
> > Cc-ing Elijah, author of that patch...
> 
> I see slowdown, but not nearly as big as you report:

The linux repo is big, my notebook is small, the poor thing :)

> $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at
> least 7216 and retry the command.
> Successfully rebased and updated detached HEAD.
> 
> real 0m13.305s
> user 0m9.644s
> sys 0m3.620s

> Interestingly, turning off rename detection only speeds it up a little bit:
> $ time git rebase -m -Xno-renames --onto v4.18 463fa44eec2fef50~
> 463fa44eec2fef50
> Successfully rebased and updated detached HEAD.
> 
> real 0m11.955s
> user 0m8.732s
> sys 0m3.424s
> 
> 
> This is an interesting testcase; I'm going to try to find some time to
> dig in further.

The culprits are two seemingly unnecessary back-and-forth checkouts.

I didn't realize I could use 'git rebase -m', so ran some tests with
it, and turns out that the slowdown started with 68aa495b59 (rebase:
implement --merge via the interactive machinery, 2018-12-11), where
the runtime suddenly went from <1.5s to 45+s.

Running 'git rebase -i --onto <those-same-commits>' is just as slow,
and it appears that it has always been (the oldest I tried was
v1.8.0), and it spends a long time both before and after popping up
the editor for the rebase instructions.  That's highly suspicious, so:

  $ git log --oneline -1
  94710cac0ef4 (HEAD, tag: v4.18) Linux 4.18
  $ git rebase -i --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
  hint: Waiting for your editor to close the file... 
  # Hit ctrl-z in the editor
  $ git log --oneline -1
  463fa44eec2f (HEAD) Input: atmel_mxt_ts - disable IRQ across suspend

Oh.

So 'git rebase -i' apparently checks out the tip commit of the
to-be-rebased revision range before invoking the editor for the rebase
instructions, only to check out the --onto commit (i.e. the commit
we've started from!) to apply the selected commit on top.

And indeed those two checkouts account for all the wasted runtime:

  $ time { git checkout 463fa44eec2fef50 && git checkout v4.18 ; }
  Updating files: 100% (49483/49483), done.
  Previous HEAD position was 94710cac0ef4 Linux 4.18
  HEAD is now at 463fa44eec2f Input: atmel_mxt_ts - disable IRQ across suspend
  Updating files: 100% (49483/49483), done.
  Previous HEAD position was 463fa44eec2f Input: atmel_mxt_ts - disable IRQ across suspend
  HEAD is now at 94710cac0ef4 Linux 4.18
  
  real    0m48.801s
  user    0m13.963s
  sys     0m5.114s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-09 15:03     ` SZEDER Gábor
@ 2020-01-09 17:53       ` Elijah Newren
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
  1 sibling, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2020-01-09 17:53 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Eugeniu Rosca, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

Hi,

On Thu, Jan 9, 2020 at 7:03 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Jan 08, 2020 at 04:55:46PM -0800, Elijah Newren wrote:
> > > Alas, there is unexpected bad news: with that commit the runtime of
> > > your 'git rebase --onto' command goes from <1sec to over 50secs.
> > > Cc-ing Elijah, author of that patch...
> >
> > I see slowdown, but not nearly as big as you report:
>
> The linux repo is big, my notebook is small, the poor thing :)

It went to just over 64secs on my home laptop (older and with spinny
disks), so yeah, a big difference from my work machine which has an
SSD.

> > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at
> > least 7216 and retry the command.
> > Successfully rebased and updated detached HEAD.
> >
> > real 0m13.305s
> > user 0m9.644s
> > sys 0m3.620s
>
> > Interestingly, turning off rename detection only speeds it up a little bit:
> > $ time git rebase -m -Xno-renames --onto v4.18 463fa44eec2fef50~
> > 463fa44eec2fef50
> > Successfully rebased and updated detached HEAD.
> >
> > real 0m11.955s
> > user 0m8.732s
> > sys 0m3.424s
> >
> >
> > This is an interesting testcase; I'm going to try to find some time to
> > dig in further.
>
> The culprits are two seemingly unnecessary back-and-forth checkouts.
>
> I didn't realize I could use 'git rebase -m', so ran some tests with
> it, and turns out that the slowdown started with 68aa495b59 (rebase:
> implement --merge via the interactive machinery, 2018-12-11), where
> the runtime suddenly went from <1.5s to 45+s.
>
> Running 'git rebase -i --onto <those-same-commits>' is just as slow,
> and it appears that it has always been (the oldest I tried was
> v1.8.0), and it spends a long time both before and after popping up
> the editor for the rebase instructions.  That's highly suspicious, so:
>
>   $ git log --oneline -1
>   94710cac0ef4 (HEAD, tag: v4.18) Linux 4.18
>   $ git rebase -i --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>   hint: Waiting for your editor to close the file...
>   # Hit ctrl-z in the editor
>   $ git log --oneline -1
>   463fa44eec2f (HEAD) Input: atmel_mxt_ts - disable IRQ across suspend
>
> Oh.
>
> So 'git rebase -i' apparently checks out the tip commit of the
> to-be-rebased revision range before invoking the editor for the rebase
> instructions, only to check out the --onto commit (i.e. the commit
> we've started from!) to apply the selected commit on top.
>
> And indeed those two checkouts account for all the wasted runtime:
>
>   $ time { git checkout 463fa44eec2fef50 && git checkout v4.18 ; }
>   Updating files: 100% (49483/49483), done.
>   Previous HEAD position was 94710cac0ef4 Linux 4.18
>   HEAD is now at 463fa44eec2f Input: atmel_mxt_ts - disable IRQ across suspend
>   Updating files: 100% (49483/49483), done.
>   Previous HEAD position was 463fa44eec2f Input: atmel_mxt_ts - disable IRQ across suspend
>   HEAD is now at 94710cac0ef4 Linux 4.18
>
>   real    0m48.801s
>   user    0m13.963s
>   sys     0m5.114s

Oh, cool, sounds like you're already investigating and found the problem.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-09 10:53   ` Eugeniu Rosca
@ 2020-01-09 18:05     ` Elijah Newren
  2020-01-10  0:06       ` Eugeniu Rosca
  0 siblings, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2020-01-09 18:05 UTC (permalink / raw)
  To: Eugeniu Rosca
  Cc: SZEDER Gábor, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

On Thu, Jan 9, 2020 at 2:53 AM Eugeniu Rosca <erosca@de.adit-jv.com> wrote:
>
> Hi Elijah, hi Szeder,
>
> On Wed, Jan 08, 2020 at 02:06:22PM -0800, Elijah Newren wrote:
> > This looks like a known bug in rebase, in particular in the am-backend that
> > rebase uses by default.  If I'm correct that it's just a context region
> > issue, then this is the same bug that was recently discussed at
> > https://lore.kernel.org/git/CAN_72e2h2avv-U9BVBYqXVKiC+5kHy-pjejyMSD3X22uRXE39g@mail.gmail.com/.
> > The current plan is to switch the default over to the merge backend (the
> > same machinery that cherry-pick uses), which doesn't suffer from the same
> > shortcomings (you can see the current work being done in this area at
> > https://lore.kernel.org/git/pull.679.v3.git.git.1577217299.gitgitgadget@gmail.com/
> > ).
>
> Thank you for your feedback and references, here and in [*].
>
> Once hit by this or similar issues, I think there is high chance for
> people to go through the same feelings as described by Pavel in [**]:
>
>   ---
>   That's so scary that I'm going to stop using "git rebase" for now.
>   ---

Yep, I understand; that kind of feeling is why I wanted to jump in and
try to help fix it.  I want merge/rebase/cherry-pick to be reliable.

> Some years ago I was hit by 'git merge' producing slightly different
> results compared to 'git rebase --onto' and 'git cherry-pick A..B'
> (maybe I can come up with a reproduction scenario for that too).

If you can, I'd be interested to see it and take a look.  I'd normally
assume it was just some case where A..B included "evil" merge commits
(merge commits that made additional changes not part of the actual
merging) since rebasing or cherry-picking such a range would exclude
the merge commits and thus drop those changes -- but you identified a
real bug with the default rebase backend so I'm interested to see if
you happen to have more bugs I should know about.

>
> Since then, I usually contrast the outcomes of merging, cherry-picking
> and rebasing, to make sure they match, but that's painful and
> time-consuming.
>
> > In the mean time, you can pass the -m flag to rebase to avoid these types
> > of problems.  In fact, if you could retry with -m you may be able to
> > confirm whether it's the same issue.
>
> Indeed, neither `git rebase -m` nor `git rebase -i` exhibit the problem.

That's good news.

Unfortunately, you should note that git-2.25 is going to have the same
bug you reported; there are still some loose ends with my series to
make -m the default, and the 2.25 release is expected within a few
days, so my change of default won't happen until 2.26.  (That series
would have needed to be completed several weeks ago for it to go into
2.25).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-09 18:05     ` Elijah Newren
@ 2020-01-10  0:06       ` Eugeniu Rosca
  2020-01-10  2:35         ` Elijah Newren
  0 siblings, 1 reply; 22+ messages in thread
From: Eugeniu Rosca @ 2020-01-10  0:06 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Eugeniu Rosca, SZEDER Gábor, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca

Hi Elijah,

On Thu, Jan 09, 2020 at 10:05:52AM -0800, Elijah Newren wrote:
> On Thu, Jan 9, 2020 at 2:53 AM Eugeniu Rosca <erosca@de.adit-jv.com> wrote:
> > Some years ago I was hit by 'git merge' producing slightly different
> > results compared to 'git rebase --onto' and 'git cherry-pick A..B'
> > (maybe I can come up with a reproduction scenario for that too).
> 
> If you can, I'd be interested to see it and take a look.  I'd normally
> assume it was just some case where A..B included "evil" merge commits
> (merge commits that made additional changes not part of the actual
> merging) since rebasing or cherry-picking such a range would exclude
> the merge commits and thus drop those changes -- but you identified a
> real bug with the default rebase backend so I'm interested to see if
> you happen to have more bugs I should know about.

Here is a _simplified_ scenario to get a totally unexpected result from
'git merge' (initially reproduced years ago, but still happening on
2.25.0.rc2):

   ## Preparation
0. git --version
   git version 2.25.0.rc2
1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
2. git remote add linux-stable https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
3. git fetch linux-stable

   # Reproduction
4. git checkout f7a8e38f07a1
5. git merge --no-edit e18da11fc0f959
   ## Merge v4.4.3 commit
   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e18da11fc0f959
   which is a linux-stable backport of vanilla v4.5-rc1 commit
   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7a8e38f07a1
   the latter being checked out at step 4.

6. git show HEAD
   ## Inspect the _automatic_ conflict resolution performed by git in
   drivers/mtd/nand/nand_base.c. Git decided to integrate e18da11fc0f959
   alongside f7a8e38f07a1, while essentially they are the same commit.
   We end up with two times commit f7a8e38f07a1.

What do you think about that?

> Unfortunately, you should note that git-2.25 is going to have the same
> bug you reported; there are still some loose ends with my series to
> make -m the default, and the 2.25 release is expected within a few
> days, so my change of default won't happen until 2.26.  (That series
> would have needed to be completed several weeks ago for it to go into
> 2.25).

Thanks for this piece of information and for the time/effort spent!

-- 
Best Regards,
Eugeniu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Unreliable 'git rebase --onto'
  2020-01-10  0:06       ` Eugeniu Rosca
@ 2020-01-10  2:35         ` Elijah Newren
  0 siblings, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2020-01-10  2:35 UTC (permalink / raw)
  To: Eugeniu Rosca
  Cc: Eugeniu Rosca, SZEDER Gábor, Junio C Hamano, Jeff King,
	Ævar Arnfjörð, Git Mailing List

Hi Eugeniu,

On Thu, Jan 9, 2020 at 4:06 PM Eugeniu Rosca <roscaeugeniu@gmail.com> wrote:
>
> Hi Elijah,
>
> On Thu, Jan 09, 2020 at 10:05:52AM -0800, Elijah Newren wrote:
> > On Thu, Jan 9, 2020 at 2:53 AM Eugeniu Rosca <erosca@de.adit-jv.com> wrote:
> > > Some years ago I was hit by 'git merge' producing slightly different
> > > results compared to 'git rebase --onto' and 'git cherry-pick A..B'
> > > (maybe I can come up with a reproduction scenario for that too).
> >
> > If you can, I'd be interested to see it and take a look.  I'd normally
> > assume it was just some case where A..B included "evil" merge commits
> > (merge commits that made additional changes not part of the actual
> > merging) since rebasing or cherry-picking such a range would exclude
> > the merge commits and thus drop those changes -- but you identified a
> > real bug with the default rebase backend so I'm interested to see if
> > you happen to have more bugs I should know about.
>
> Here is a _simplified_ scenario to get a totally unexpected result from
> 'git merge' (initially reproduced years ago, but still happening on
> 2.25.0.rc2):
>
>    ## Preparation
> 0. git --version
>    git version 2.25.0.rc2
> 1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 2. git remote add linux-stable https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> 3. git fetch linux-stable
>
>    # Reproduction
> 4. git checkout f7a8e38f07a1
> 5. git merge --no-edit e18da11fc0f959
>    ## Merge v4.4.3 commit
>    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e18da11fc0f959
>    which is a linux-stable backport of vanilla v4.5-rc1 commit
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7a8e38f07a1
>    the latter being checked out at step 4.
>
> 6. git show HEAD
>    ## Inspect the _automatic_ conflict resolution performed by git in
>    drivers/mtd/nand/nand_base.c. Git decided to integrate e18da11fc0f959
>    alongside f7a8e38f07a1, while essentially they are the same commit.
>    We end up with two times commit f7a8e38f07a1.
>
> What do you think about that?

Ooh, interesting case; thanks for sending it along.  I think this is
the same as https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/
, which struck git itself not that long back.  It didn't do any actual
harm, though, it was just surprising.  I'm not familiar with the xdiff
part of the codebase, so I don't know if this is a heuristic thing, or
something more along the lines of the diff3 issues mentioned at
https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf.  I read up
on this area a little bit a few months ago and I'd like to dig more at
the diff3 stuff in general, but it may be a little while.  If you see
more issues like this, though, I'm definitely interested in saving and
cataloging them for when I get back to this.

Elijah

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-09 15:03     ` SZEDER Gábor
  2020-01-09 17:53       ` Elijah Newren
@ 2020-01-21 19:18       ` Alban Gruin
  2020-01-21 20:07         ` Elijah Newren
                           ` (3 more replies)
  1 sibling, 4 replies; 22+ messages in thread
From: Alban Gruin @ 2020-01-21 19:18 UTC (permalink / raw)
  To: SZEDER Gábor, git, Elijah Newren
  Cc: Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Alban Gruin

One of the first things done by the interactive rebase is to make a todo
list.  This requires knowledge of the commit range to rebase.  To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
HEAD is read.  On big repositories, it's a performance penalty: the user
may have to wait before editing the todo list while git is extracting the
branch silently (because git-checkout is silenced here).  After this,
the head of the branch is not even modified.

Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.

This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
is no longer used.

This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---

Notes:
    Improvements brought by this patch:
    
    Before:
    
    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
    
    real    0m8,940s
    user    0m6,830s
    sys     0m2,121s
    
    After:
    
    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
    
    real    0m1,834s
    user    0m0,916s
    sys     0m0,206s
    
    Both tests have been performed on a 5400 RPM SATA III hard drive.

 builtin/rebase.c | 18 +++++-------------
 sequencer.c      | 14 --------------
 sequencer.h      |  3 ---
 3 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 8081741f8a..6154ad8fa5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
 }
 
 static int get_revision_ranges(struct commit *upstream, struct commit *onto,
-			       const char **head_hash,
+			       struct object_id *orig_head, const char **head_hash,
 			       char **revisions, char **shortrevisions)
 {
 	struct commit *base_rev = upstream ? upstream : onto;
 	const char *shorthead;
-	struct object_id orig_head;
-
-	if (get_oid("HEAD", &orig_head))
-		return error(_("no HEAD?"));
 
-	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
+	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
 	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
 						   *head_hash);
 
-	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
+	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
 
 	if (upstream) {
 		const char *shortrev;
@@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 	struct replay_opts replay = get_replay_opts(opts);
 	struct string_list commands = STRING_LIST_INIT_DUP;
 
-	if (prepare_branch_to_be_rebased(the_repository, &replay,
-					 opts->switch_to))
-		return -1;
-
-	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
-				&revisions, &shortrevisions))
+	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
+				&head_hash, &revisions, &shortrevisions))
 		return -1;
 
 	if (init_basic_state(&replay,
diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4dc245d7ec 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
 	return ret;
 }
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit)
-{
-	const char *action;
-
-	if (commit && *commit) {
-		action = reflog_message(opts, "start", "checkout %s", commit);
-		if (run_git_checkout(r, opts, commit, action))
-			return error(_("could not checkout %s"), commit);
-	}
-
-	return 0;
-}
-
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
 			 const char *onto_name, const struct object_id *onto,
 			 const char *orig_head)
diff --git a/sequencer.h b/sequencer.h
index 9f9ae291e3..74f1e2673e 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
 			 const struct commit *current_head,
 			 const struct object_id *new_head);
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit);
-
 #define SUMMARY_INITIAL_COMMIT   (1 << 0)
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(struct repository *repo,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
@ 2020-01-21 20:07         ` Elijah Newren
  2020-01-22 20:24         ` Junio C Hamano
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2020-01-21 20:07 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, Git Mailing List, Ævar Arnfjörð,
	Eugeniu Rosca, Junio C Hamano, Jeff King, Eugeniu Rosca,
	Phillip Wood, Johannes Schindelin

Hi Alban,

// Adding Phillip and Johannes since they know the sequencer internals
very well.

On Tue, Jan 21, 2020 at 11:21 AM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> One of the first things done by the interactive rebase is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> HEAD is read.  On big repositories, it's a performance penalty: the user
> may have to wait before editing the todo list while git is extracting the
> branch silently (because git-checkout is silenced here).  After this,
> the head of the branch is not even modified.
>
> Since we already have the oid of the tip of the branch in
> `opts->orig_head', it's useless to switch to this commit.
>
> This removes the call to prepare_branch_to_be_rebased() in
> do_interactive_rebase(), and adds a `orig_head' parameter to
> get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
> is no longer used.
>
> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.

Oh, sweet, thanks for digging in.  I had also dug in just after the
report, but not quite far enough as I still had failing tests and I
was feeling a bit stretched thin on other projects so I punted hoping
that SZEDER would post something.  Looks like the orig_head thing was
probably what I was missing.

I was a little surprised that there wasn't any regression test that
needed to be modified, as it reminded me of a previous conversation
about excessive work in the interactive backend[1], but after looking
it up that was apparently about too many calls to commit rather than
too many calls to checkout.

[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.1811121614190.39@tvgsbejvaqbjf.bet/

> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>
> Notes:
>     Improvements brought by this patch:
>
>     Before:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m8,940s
>     user    0m6,830s
>     sys     0m2,121s
>
>     After:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m1,834s
>     user    0m0,916s
>     sys     0m0,206s

Nice...do we want to mention this in the commit message proper too?

>
>     Both tests have been performed on a 5400 RPM SATA III hard drive.
>
>  builtin/rebase.c | 18 +++++-------------
>  sequencer.c      | 14 --------------
>  sequencer.h      |  3 ---
>  3 files changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..6154ad8fa5 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
>  }
>
>  static int get_revision_ranges(struct commit *upstream, struct commit *onto,
> -                              const char **head_hash,
> +                              struct object_id *orig_head, const char **head_hash,
>                                char **revisions, char **shortrevisions)
>  {
>         struct commit *base_rev = upstream ? upstream : onto;
>         const char *shorthead;
> -       struct object_id orig_head;
> -
> -       if (get_oid("HEAD", &orig_head))
> -               return error(_("no HEAD?"));
>
> -       *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
> +       *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
>         *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
>                                                    *head_hash);
>
> -       shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
> +       shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
>
>         if (upstream) {
>                 const char *shortrev;
> @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
>         struct replay_opts replay = get_replay_opts(opts);
>         struct string_list commands = STRING_LIST_INIT_DUP;
>
> -       if (prepare_branch_to_be_rebased(the_repository, &replay,
> -                                        opts->switch_to))
> -               return -1;
> -
> -       if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
> -                               &revisions, &shortrevisions))
> +       if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
> +                               &head_hash, &revisions, &shortrevisions))
>                 return -1;
>
>         if (init_basic_state(&replay,
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..4dc245d7ec 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
>         return ret;
>  }
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -                                const char *commit)
> -{
> -       const char *action;
> -
> -       if (commit && *commit) {
> -               action = reflog_message(opts, "start", "checkout %s", commit);
> -               if (run_git_checkout(r, opts, commit, action))
> -                       return error(_("could not checkout %s"), commit);
> -       }
> -
> -       return 0;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>                          const char *onto_name, const struct object_id *onto,
>                          const char *orig_head)
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..74f1e2673e 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
>                          const struct commit *current_head,
>                          const struct object_id *new_head);
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -                                const char *commit);
> -
>  #define SUMMARY_INITIAL_COMMIT   (1 << 0)
>  #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
>  void print_commit_summary(struct repository *repo,
> --
> 2.24.1

The code looks reasonable to me, but I'm still not completely familiar
with all the rebase and sequencer code so I'm hoping Phillip or
Johannes can give a thumbs up.

Thanks for digging into this and figuring out the bits that I missed
when I tried.


Elijah

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
  2020-01-21 20:07         ` Elijah Newren
@ 2020-01-22 20:24         ` Junio C Hamano
  2020-01-22 20:47         ` Junio C Hamano
  2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
  3 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2020-01-22 20:24 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Jeff King,
	Eugeniu Rosca

Alban Gruin <alban.gruin@gmail.com> writes:

> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.

Fortunately, this does not mean "git log @{-1}.." during a rebase
starts to behave differently.  If it were the case, that would have
been a bad regression, but that is not the case.

If you omit one reflog entry, you are creating TWO changes.  The
detaching of the HEAD to the "onto" commit would record, just like
any other reflog entry, would record from which commit we detached
HEAD from.  That reflog entry will also be different, as you'd be
switching from a different commit.  

I am not sure what the implication of this difference will be in
practice, though, but it must be smaller than the effect of the
missing reflog entry discussed earlier.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
  2020-01-21 20:07         ` Elijah Newren
  2020-01-22 20:24         ` Junio C Hamano
@ 2020-01-22 20:47         ` Junio C Hamano
  2020-01-24 14:45           ` Alban Gruin
  2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
  3 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2020-01-22 20:47 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Jeff King,
	Eugeniu Rosca

Alban Gruin <alban.gruin@gmail.com> writes:

> One of the first things done by the interactive rebase is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> HEAD is read.  On big repositories, it's a performance penalty: the user
> may have to wait before editing the todo list while git is extracting the
> branch silently (because git-checkout is silenced here).  After this,
> the head of the branch is not even modified.

Hmph.  One curious thing in the above is why this is specific to
"rebase -i". The need to know the commit range to rebase is shared
across any rebase backend, and it would be the most natural to parse
the optional second argument (i.e. the branch or the commit to
rebase) before builtin/rebase.c dispatches to a specific rebase
backend, wouldn't it?  So, the question is why a normal "rebase"
does not need the same fix?

If the answer is "rebase in general was fine without extra checkout,
but 'rebase -i' was doing an unnecessary checkout" (or any other
answer) that is something that would help future readers to record
in the commit log message.

Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-22 20:47         ` Junio C Hamano
@ 2020-01-24 14:45           ` Alban Gruin
  0 siblings, 0 replies; 22+ messages in thread
From: Alban Gruin @ 2020-01-24 14:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Jeff King,
	Eugeniu Rosca

Hi Junio,

Le 22/01/2020 à 21:47, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>> One of the first things done by the interactive rebase is to make a todo
>> list.  This requires knowledge of the commit range to rebase.  To get
>> the oid of the last commit of the range, the tip of the branch to rebase
>> is checked out with prepare_branch_to_be_rebased(), then the oid of the
>> HEAD is read.  On big repositories, it's a performance penalty: the user
>> may have to wait before editing the todo list while git is extracting the
>> branch silently (because git-checkout is silenced here).  After this,
>> the head of the branch is not even modified.
> 
> Hmph.  One curious thing in the above is why this is specific to
> "rebase -i". The need to know the commit range to rebase is shared
> across any rebase backend, and it would be the most natural to parse
> the optional second argument (i.e. the branch or the commit to
> rebase) before builtin/rebase.c dispatches to a specific rebase
> backend, wouldn't it?  So, the question is why a normal "rebase"
> does not need the same fix?
> 

That's a problem shared by all rebases using the sequencer, so -m and -r
are also affected by this.  `am' is not.

> If the answer is "rebase in general was fine without extra checkout,
> but 'rebase -i' was doing an unnecessary checkout" (or any other
> answer) that is something that would help future readers to record
> in the commit log message.
> 

So yes, the answer is that the am backend does not perform this
checkout, unlike all others rebases.

I will resend this patch very soon.

> Thanks.
> 
> 

Cheers,
Alban


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
                           ` (2 preceding siblings ...)
  2020-01-22 20:47         ` Junio C Hamano
@ 2020-01-24 14:45         ` Alban Gruin
  2020-01-24 14:55           ` Alban Gruin
                             ` (2 more replies)
  3 siblings, 3 replies; 22+ messages in thread
From: Alban Gruin @ 2020-01-24 14:45 UTC (permalink / raw)
  To: SZEDER Gábor, git, Elijah Newren
  Cc: Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Phillip Wood, Johannes Schindelin,
	Alban Gruin

One of the first things done when using a sequencer-based
rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
list.  This requires knowledge of the commit range to rebase.  To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
head is read.  After this, the tip of the branch is not even modified.

On big repositories, it's a performance penalty: with `rebase -i', the
user may have to wait before editing the todo list while git is
extracting the branch silently, and "quiet" rebases will be slower than
`am'.

Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.

This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
is no longer used.

This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.

Unscientific performance measurements, performed on linux.git, are as
follow:

  Before this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m8,940s
    user    0m6,830s
    sys     0m2,121s

  After this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m1,834s
    user    0m0,916s
    sys     0m0,206s

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---

Notes:
    Changes since v1:
    
     - The first version of the commit message talked specifically about
       `rebase -i', but this problem is common to all sequencer-based
       rebases.  The first paragraph has been reworded to clear up the
       confusion.
    
     - Included benchmarks in the commit message, as suggested by Elijah
       Newren.
    
    The code did not change.

 builtin/rebase.c | 18 +++++-------------
 sequencer.c      | 14 --------------
 sequencer.h      |  3 ---
 3 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 8081741f8a..6154ad8fa5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
 }
 
 static int get_revision_ranges(struct commit *upstream, struct commit *onto,
-			       const char **head_hash,
+			       struct object_id *orig_head, const char **head_hash,
 			       char **revisions, char **shortrevisions)
 {
 	struct commit *base_rev = upstream ? upstream : onto;
 	const char *shorthead;
-	struct object_id orig_head;
-
-	if (get_oid("HEAD", &orig_head))
-		return error(_("no HEAD?"));
 
-	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
+	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
 	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
 						   *head_hash);
 
-	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
+	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
 
 	if (upstream) {
 		const char *shortrev;
@@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 	struct replay_opts replay = get_replay_opts(opts);
 	struct string_list commands = STRING_LIST_INIT_DUP;
 
-	if (prepare_branch_to_be_rebased(the_repository, &replay,
-					 opts->switch_to))
-		return -1;
-
-	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
-				&revisions, &shortrevisions))
+	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
+				&head_hash, &revisions, &shortrevisions))
 		return -1;
 
 	if (init_basic_state(&replay,
diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4dc245d7ec 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
 	return ret;
 }
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit)
-{
-	const char *action;
-
-	if (commit && *commit) {
-		action = reflog_message(opts, "start", "checkout %s", commit);
-		if (run_git_checkout(r, opts, commit, action))
-			return error(_("could not checkout %s"), commit);
-	}
-
-	return 0;
-}
-
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
 			 const char *onto_name, const struct object_id *onto,
 			 const char *orig_head)
diff --git a/sequencer.h b/sequencer.h
index 9f9ae291e3..74f1e2673e 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
 			 const struct commit *current_head,
 			 const struct object_id *new_head);
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit);
-
 #define SUMMARY_INITIAL_COMMIT   (1 << 0)
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(struct repository *repo,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
@ 2020-01-24 14:55           ` Alban Gruin
  2020-01-24 18:12             ` Junio C Hamano
  2020-01-24 15:05           ` [PATCH v3] " Alban Gruin
  2020-01-24 17:11           ` [PATCH v2] " Andrei Rybak
  2 siblings, 1 reply; 22+ messages in thread
From: Alban Gruin @ 2020-01-24 14:55 UTC (permalink / raw)
  To: SZEDER Gábor, git, Elijah Newren
  Cc: Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Phillip Wood, Johannes Schindelin

Le 24/01/2020 à 15:45, Alban Gruin a écrit :
> One of the first things done when using a sequencer-based
> rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> head is read.  After this, the tip of the branch is not even modified.
> 
> On big repositories, it's a performance penalty: with `rebase -i', the
> user may have to wait before editing the todo list while git is
> extracting the branch silently, and "quiet" rebases will be slower than
> `am'.
> 
> Since we already have the oid of the tip of the branch in
> `opts->orig_head', it's useless to switch to this commit.
> 
> This removes the call to prepare_branch_to_be_rebased() in
> do_interactive_rebase(), and adds a `orig_head' parameter to
> get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
> is no longer used.
> 
> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.
> 
> Unscientific performance measurements, performed on linux.git, are as
> follow:
> 
>   Before this patch:
> 
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
> 
>     real    0m8,940s
>     user    0m6,830s
>     sys     0m2,121s
> 
>   After this patch:
> 
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
> 
>     real    0m1,834s
>     user    0m0,916s
>     sys     0m0,206s
> 
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
> 

Forget this patch, I forgot to clearly say that the `am' backend is not
affected.

> Notes:
>     Changes since v1:
>     
>      - The first version of the commit message talked specifically about
>        `rebase -i', but this problem is common to all sequencer-based
>        rebases.  The first paragraph has been reworded to clear up the
>        confusion.
>     
>      - Included benchmarks in the commit message, as suggested by Elijah
>        Newren.
>     
>     The code did not change.
> 
>  builtin/rebase.c | 18 +++++-------------
>  sequencer.c      | 14 --------------
>  sequencer.h      |  3 ---
>  3 files changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..6154ad8fa5 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
>  }
>  
>  static int get_revision_ranges(struct commit *upstream, struct commit *onto,
> -			       const char **head_hash,
> +			       struct object_id *orig_head, const char **head_hash,
>  			       char **revisions, char **shortrevisions)
>  {
>  	struct commit *base_rev = upstream ? upstream : onto;
>  	const char *shorthead;
> -	struct object_id orig_head;
> -
> -	if (get_oid("HEAD", &orig_head))
> -		return error(_("no HEAD?"));
>  
> -	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
> +	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
>  	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
>  						   *head_hash);
>  
> -	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
> +	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
>  
>  	if (upstream) {
>  		const char *shortrev;
> @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
>  	struct replay_opts replay = get_replay_opts(opts);
>  	struct string_list commands = STRING_LIST_INIT_DUP;
>  
> -	if (prepare_branch_to_be_rebased(the_repository, &replay,
> -					 opts->switch_to))
> -		return -1;
> -
> -	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
> -				&revisions, &shortrevisions))
> +	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
> +				&head_hash, &revisions, &shortrevisions))
>  		return -1;
>  
>  	if (init_basic_state(&replay,
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..4dc245d7ec 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
>  	return ret;
>  }
>  
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit)
> -{
> -	const char *action;
> -
> -	if (commit && *commit) {
> -		action = reflog_message(opts, "start", "checkout %s", commit);
> -		if (run_git_checkout(r, opts, commit, action))
> -			return error(_("could not checkout %s"), commit);
> -	}
> -
> -	return 0;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>  			 const char *onto_name, const struct object_id *onto,
>  			 const char *orig_head)
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..74f1e2673e 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
>  			 const struct commit *current_head,
>  			 const struct object_id *new_head);
>  
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit);
> -
>  #define SUMMARY_INITIAL_COMMIT   (1 << 0)
>  #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
>  void print_commit_summary(struct repository *repo,
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
  2020-01-24 14:55           ` Alban Gruin
@ 2020-01-24 15:05           ` Alban Gruin
  2020-01-24 18:30             ` Junio C Hamano
  2020-02-05 14:31             ` Johannes Schindelin
  2020-01-24 17:11           ` [PATCH v2] " Andrei Rybak
  2 siblings, 2 replies; 22+ messages in thread
From: Alban Gruin @ 2020-01-24 15:05 UTC (permalink / raw)
  To: SZEDER Gábor, git, Elijah Newren
  Cc: Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Phillip Wood, Johannes Schindelin,
	Alban Gruin

One of the first things done when using a sequencer-based
rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
list.  This requires knowledge of the commit range to rebase.  To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
head is read.  After this, the tip of the branch is not even modified.
The `am' backend, on the other hand, does not check out the branch.

On big repositories, it's a performance penalty: with `rebase -i', the
user may have to wait before editing the todo list while git is
extracting the branch silently, and "quiet" rebases will be slower than
`am'.

Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.

This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
is no longer used.

This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.

Unscientific performance measurements, performed on linux.git, are as
follow:

  Before this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m8,940s
    user    0m6,830s
    sys     0m2,121s

  After this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m1,834s
    user    0m0,916s
    sys     0m0,206s

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---

Added a line in the first paragraph to make it clear that the `am'
backend is not affected.

 builtin/rebase.c | 18 +++++-------------
 sequencer.c      | 14 --------------
 sequencer.h      |  3 ---
 3 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 8081741f8a..6154ad8fa5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
 }
 
 static int get_revision_ranges(struct commit *upstream, struct commit *onto,
-			       const char **head_hash,
+			       struct object_id *orig_head, const char **head_hash,
 			       char **revisions, char **shortrevisions)
 {
 	struct commit *base_rev = upstream ? upstream : onto;
 	const char *shorthead;
-	struct object_id orig_head;
-
-	if (get_oid("HEAD", &orig_head))
-		return error(_("no HEAD?"));
 
-	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
+	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
 	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
 						   *head_hash);
 
-	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
+	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
 
 	if (upstream) {
 		const char *shortrev;
@@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 	struct replay_opts replay = get_replay_opts(opts);
 	struct string_list commands = STRING_LIST_INIT_DUP;
 
-	if (prepare_branch_to_be_rebased(the_repository, &replay,
-					 opts->switch_to))
-		return -1;
-
-	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
-				&revisions, &shortrevisions))
+	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
+				&head_hash, &revisions, &shortrevisions))
 		return -1;
 
 	if (init_basic_state(&replay,
diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4dc245d7ec 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
 	return ret;
 }
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit)
-{
-	const char *action;
-
-	if (commit && *commit) {
-		action = reflog_message(opts, "start", "checkout %s", commit);
-		if (run_git_checkout(r, opts, commit, action))
-			return error(_("could not checkout %s"), commit);
-	}
-
-	return 0;
-}
-
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
 			 const char *onto_name, const struct object_id *onto,
 			 const char *orig_head)
diff --git a/sequencer.h b/sequencer.h
index 9f9ae291e3..74f1e2673e 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
 			 const struct commit *current_head,
 			 const struct object_id *new_head);
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit);
-
 #define SUMMARY_INITIAL_COMMIT   (1 << 0)
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(struct repository *repo,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
  2020-01-24 14:55           ` Alban Gruin
  2020-01-24 15:05           ` [PATCH v3] " Alban Gruin
@ 2020-01-24 17:11           ` Andrei Rybak
  2 siblings, 0 replies; 22+ messages in thread
From: Andrei Rybak @ 2020-01-24 17:11 UTC (permalink / raw)
  To: Alban Gruin, SZEDER Gábor, git, Elijah Newren
  Cc: Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Phillip Wood, Johannes Schindelin

On 2020-01-24 15:45, Alban Gruin wrote:
> Notes:
>     Changes since v1:
>     
>      - The first version of the commit message talked specifically about
>        `rebase -i', but this problem is common to all sequencer-based
>        rebases.  The first paragraph has been reworded to clear up the
>        confusion.

Would it make sense to update the subject line as well?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 14:55           ` Alban Gruin
@ 2020-01-24 18:12             ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2020-01-24 18:12 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Jeff King,
	Eugeniu Rosca, Phillip Wood, Johannes Schindelin

Alban Gruin <alban.gruin@gmail.com> writes:

> Forget this patch, I forgot to clearly say that the `am' backend is not
> affected.

The phrase "is not affected" makes it sound like "Do not worry, I
made sure it is not broken by this patch", but I do not think that
is the more important part ;-).  

The shared codepath for all types of rebase before dispatching
already knew what commit at the tip of the branch being rebased is,
but the sequencer-based backend was doing unnecessary work to figure
it out again by checking it out.  And this patch is about fixing
that, isn't it?

So I do not think singling out 'am' is a good use of readers' time.
The first paragraph can be further tweaked why the extra checkout is
unneeded.

    Before dispatching the control to one of the individual rebase
    backends, the shared codepath in "rebase" figures out what
    branch is being rebased, because it is necessary to compute the
    range of commits to replay to run any backend.  The rebase
    backend based on the sequencer machinery (used for '-i', '-r'
    and '-m') however computed this commit range by actually
    checking out the branch and reading HEAD, which was unnecessary,
    as the working tree is then immediately gets reset to that of
    the commit on which rebased history is built (aka "onto"
    commit).

or something along the line, perhaps?

With this patch applied, the wasteful prepare_branch_to_be_rebased()
has no caller, and the patch removes it from sequencer.[ch] as well,
which is very good.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 15:05           ` [PATCH v3] " Alban Gruin
@ 2020-01-24 18:30             ` Junio C Hamano
  2020-02-05 14:31             ` Johannes Schindelin
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2020-01-24 18:30 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Jeff King,
	Eugeniu Rosca, Phillip Wood, Johannes Schindelin

Alban Gruin <alban.gruin@gmail.com> writes:

> On big repositories, it's a performance penalty: with `rebase -i', the
> user may have to wait before editing the todo list while git is
> extracting the branch silently, and "quiet" rebases will be slower than
> `am'.
>
> Since we already have the oid of the tip of the branch in
> `opts->orig_head', it's useless to switch to this commit.
> ...
>   Before this patch:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m8,940s
>     user    0m6,830s
>     sys     0m2,121s
>
>   After this patch:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m1,834s
>     user    0m0,916s
>     sys     0m0,206s
>
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---

Good.

> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..4dc245d7ec 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
>  	return ret;
>  }
>  
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit)
> -{
> -	const char *action;
> -
> -	if (commit && *commit) {
> -		action = reflog_message(opts, "start", "checkout %s", commit);
> -		if (run_git_checkout(r, opts, commit, action))
> -			return error(_("could not checkout %s"), commit);
> -	}
> -
> -	return 0;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>  			 const char *onto_name, const struct object_id *onto,
>  			 const char *orig_head)
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..74f1e2673e 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
>  			 const struct commit *current_head,
>  			 const struct object_id *new_head);
>  
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit);
> -
>  #define SUMMARY_INITIAL_COMMIT   (1 << 0)
>  #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
>  void print_commit_summary(struct repository *repo,

Nice to see this helper to go.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] rebase -i: stop checking out the tip of the branch to rebase
  2020-01-24 15:05           ` [PATCH v3] " Alban Gruin
  2020-01-24 18:30             ` Junio C Hamano
@ 2020-02-05 14:31             ` Johannes Schindelin
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2020-02-05 14:31 UTC (permalink / raw)
  To: Alban Gruin
  Cc: SZEDER Gábor, git, Elijah Newren,
	Ævar Arnfjörð, Eugeniu Rosca, Junio C Hamano,
	Jeff King, Eugeniu Rosca, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 8130 bytes --]

Hi Alban,

On Fri, 24 Jan 2020, Alban Gruin wrote:

> One of the first things done when using a sequencer-based
> rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> head is read.  After this, the tip of the branch is not even modified.
> The `am' backend, on the other hand, does not check out the branch.
>
> On big repositories, it's a performance penalty: with `rebase -i', the
> user may have to wait before editing the todo list while git is
> extracting the branch silently, and "quiet" rebases will be slower than
> `am'.
>
> Since we already have the oid of the tip of the branch in
> `opts->orig_head', it's useless to switch to this commit.
>
> This removes the call to prepare_branch_to_be_rebased() in
> do_interactive_rebase(), and adds a `orig_head' parameter to
> get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
> is no longer used.

At this point, I am a bit puzzled as a reader: why can we just drop this?
My immediate reaction was: isn't this required to switch to a new branch
when `switch_to` is non-`NULL`?

So I went digging a little. The `prepare_branch_to_be_rebased()` call was
introduced in 53bbcfbde7c (rebase -i: implement the main part of
interactive rebase as a builtin, 2018-09-27).

And looking at the `git-rebase--interactive` part of that patch, it
becomes relatively obvious that we inherited this behavior from the shell
scripting days.

2c58483a598 (rebase -i: rewrite setup_reflog_action() in C, 2018-08-10)
converted the `setup_reflog_action` function (which oddly enough not only
set up the reflog action, but also switched to a new branch if so
configured). That function was introduced in d48f97aa854 (rebase: reindent
function git_rebase__interactive, 2018-03-23), but that was _still_ not
the commit that introduced that "let's check out the upstream" behavior.

It goes _all_ the way back to 1b1dce4bae7 (Teach rebase an interactive
mode, 2007-06-25). Except that back then, it was only done when a branch
name was provided (`git rebase -i <upstream> <branch-to-switch-to>`). So
it behaved correctly.

The problem was introduced in 71786f54c41 (rebase: factor out reference
parsing, 2011-02-06), as it substituted the `if test ! -z "$1"` with `if
test ! -z "$switch_to"`, relying on the command-line parsing of
`git-rebase.sh`.

But wait! Wait, wait, wait! `switch_to` is still only set in that
incantation where we provide a branch name _in addition to_ an upstream
commit.

Ah, I think I slowly see where this is going. The problem is actually
2ec33cdd19b (rebase--interactive: don't require what's rebased to be a
branch, 2010-03-14) which failed to realize that essentially the entire
`git checkout` was necessary to accommodate the subsequent call a mere 8
lines further down from that `checkout`:

		git symbolic-ref HEAD > "$DOTEST"/head-name 2> /dev/null ||
                        echo "detached HEAD" > "$DOTEST"/head-name

So that 2ec33cdd19b commit could have saved itself a lot of trouble by
realizing what the role of that `git checkout` is, and should have pulled
that `head-name` logic into that conditional instead of _actually_
switching to a new branch.

Now, let's see what the C code does to determine "head-name". Indeed, it
is already handled in the option parsing, in that monster of a function
called `cmd_rebase()`.

And yes, I think that `head_name` should also be mentioned in this commit
message, as something like

	git rebase -i <base> <branch-to-switch-to>

should eventually indeed switch to the specified branch, and this here
patch does _not_ break that promise.

This is my only concern with this patch, though, therefore:

Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Thanks,
Dscho

> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.
>
> Unscientific performance measurements, performed on linux.git, are as
> follow:
>
>   Before this patch:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m8,940s
>     user    0m6,830s
>     sys     0m2,121s
>
>   After this patch:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m1,834s
>     user    0m0,916s
>     sys     0m0,206s
>
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>
> Added a line in the first paragraph to make it clear that the `am'
> backend is not affected.
>
>  builtin/rebase.c | 18 +++++-------------
>  sequencer.c      | 14 --------------
>  sequencer.h      |  3 ---
>  3 files changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..6154ad8fa5 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
>  }
>
>  static int get_revision_ranges(struct commit *upstream, struct commit *onto,
> -			       const char **head_hash,
> +			       struct object_id *orig_head, const char **head_hash,
>  			       char **revisions, char **shortrevisions)
>  {
>  	struct commit *base_rev = upstream ? upstream : onto;
>  	const char *shorthead;
> -	struct object_id orig_head;
> -
> -	if (get_oid("HEAD", &orig_head))
> -		return error(_("no HEAD?"));
>
> -	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
> +	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
>  	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
>  						   *head_hash);
>
> -	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
> +	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
>
>  	if (upstream) {
>  		const char *shortrev;
> @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
>  	struct replay_opts replay = get_replay_opts(opts);
>  	struct string_list commands = STRING_LIST_INIT_DUP;
>
> -	if (prepare_branch_to_be_rebased(the_repository, &replay,
> -					 opts->switch_to))
> -		return -1;
> -
> -	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
> -				&revisions, &shortrevisions))
> +	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
> +				&head_hash, &revisions, &shortrevisions))
>  		return -1;
>
>  	if (init_basic_state(&replay,
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..4dc245d7ec 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
>  	return ret;
>  }
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit)
> -{
> -	const char *action;
> -
> -	if (commit && *commit) {
> -		action = reflog_message(opts, "start", "checkout %s", commit);
> -		if (run_git_checkout(r, opts, commit, action))
> -			return error(_("could not checkout %s"), commit);
> -	}
> -
> -	return 0;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>  			 const char *onto_name, const struct object_id *onto,
>  			 const char *orig_head)
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..74f1e2673e 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
>  			 const struct commit *current_head,
>  			 const struct object_id *new_head);
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -				 const char *commit);
> -
>  #define SUMMARY_INITIAL_COMMIT   (1 << 0)
>  #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
>  void print_commit_summary(struct repository *repo,
> --
> 2.24.1
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-02-05 14:31 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 21:43 Unreliable 'git rebase --onto' Eugeniu Rosca
2020-01-08 22:35 ` SZEDER Gábor
2020-01-09  0:55   ` Elijah Newren
2020-01-09 15:03     ` SZEDER Gábor
2020-01-09 17:53       ` Elijah Newren
2020-01-21 19:18       ` [PATCH v1] rebase -i: stop checking out the tip of the branch to rebase Alban Gruin
2020-01-21 20:07         ` Elijah Newren
2020-01-22 20:24         ` Junio C Hamano
2020-01-22 20:47         ` Junio C Hamano
2020-01-24 14:45           ` Alban Gruin
2020-01-24 14:45         ` [PATCH v2] " Alban Gruin
2020-01-24 14:55           ` Alban Gruin
2020-01-24 18:12             ` Junio C Hamano
2020-01-24 15:05           ` [PATCH v3] " Alban Gruin
2020-01-24 18:30             ` Junio C Hamano
2020-02-05 14:31             ` Johannes Schindelin
2020-01-24 17:11           ` [PATCH v2] " Andrei Rybak
2020-01-09 11:13   ` Unreliable 'git rebase --onto' Eugeniu Rosca
     [not found] ` <CABPp-BHsyMOz+hi7EYoAnAWfzms7FRfwqCoarnu8H+vyDoN6SQ@mail.gmail.com>
2020-01-09 10:53   ` Eugeniu Rosca
2020-01-09 18:05     ` Elijah Newren
2020-01-10  0:06       ` Eugeniu Rosca
2020-01-10  2:35         ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).