git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git 2.24: git revert <commit1> <commit2> requires extra '--continue'?
@ 2019-11-22 23:10 Brian Norris
  2019-11-23  0:34 ` SZEDER Gábor
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Norris @ 2019-11-22 23:10 UTC (permalink / raw)
  To: git

Hi! I'm using git 2.24 (or, a variant of that packaged my distro -- I
can try to build my own if this is deemed not reproducible), and I
feel like I've been seeing a regression here:

Previously, when reverting multiple commits (with the default --edit
behavior), as soon as I'm done editing the first revert commit
message, the second revert commit pops up an editor, and I'm on my
way. Now, it seems to require an extra 'git revert --continue' prompt
in between, yet it doesn't actually recommend that. This is highly
confusing, and seemingly unnecessary.

An annotated transcript provided below.

Note that none of this happens with --no-edit; my reverts happen
smoothly, with no extra need for --continue.

Regards,
Brian

$ mkdir tmp
$ cd tmp
/tmp$ git init
Initialized empty Git repository in [...]/tmp/.git/
/tmp$ touch foo
/tmp$ git add foo
/tmp$ echo bar >> foo
/tmp$ git commit -am baz
[master (root-commit) a388f78d9013] baz
 1 file changed, 1 insertion(+)
 create mode 100644 foo
/tmp$ echo pow >> foo
/tmp$ git commit -am whizzbang
[master 51753222dd9a] whizzbang
 1 file changed, 1 insertion(+)
/tmp$ echo pop >> foo
/tmp$ git commit -am nothing
[master 7153607b11e0] nothing
 1 file changed, 1 insertion(+)
/tmp$ git revert HEAD HEAD^
## EDITOR pops up, as expected
[master 586469974ec2] Revert "nothing"
 1 file changed, 1 deletion(-)
On branch master
Revert currently in progress.

nothing to commit, working tree clean
## Unexpected, confusing pause? No prompt to '--continue'
/tmp$ git log --oneline
586469974ec2 (HEAD -> master) Revert "nothing"
7153607b11e0 nothing
51753222dd9a whizzbang
a388f78d9013 baz
/tmp$ git status
On branch master
Revert currently in progress.
  (run "git revert --continue" to continue)
  (use "git revert --skip" to skip this patch)
  (use "git revert --abort" to cancel the revert operation)

nothing to commit, working tree clean
/tmp$ git revert --continue
## EDITOR pops up, as expected
[master b8dd23f61d07] Revert "whizzbang"
 1 file changed, 1 deletion(-)
On branch master
Revert currently in progress.

nothing to commit, working tree clean
## Unexpected state; both reverts happened, but revert is still in progress?
/tmp$ git log --oneline
b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
586469974ec2 Revert "nothing"
7153607b11e0 nothing
51753222dd9a whizzbang
a388f78d9013 baz
/tmp$ git status
On branch master
Revert currently in progress.
  (run "git revert --continue" to continue)
  (use "git revert --skip" to skip this patch)
  (use "git revert --abort" to cancel the revert operation)

nothing to commit, working tree clean
## OK...I'll run it one more time.
/tmp$ git revert --continue
/tmp$ git status
On branch master
nothing to commit, working tree clean
## *Now* I'm done
/tmp$ git log --oneline
b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
586469974ec2 Revert "nothing"
7153607b11e0 nothing
51753222dd9a whizzbang
a388f78d9013 baz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: git 2.24: git revert <commit1> <commit2> requires extra '--continue'?
  2019-11-22 23:10 git 2.24: git revert <commit1> <commit2> requires extra '--continue'? Brian Norris
@ 2019-11-23  0:34 ` SZEDER Gábor
  2019-11-23  9:53   ` Phillip Wood
  0 siblings, 1 reply; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-23  0:34 UTC (permalink / raw)
  To: Brian Norris; +Cc: git, Phillip Wood

On Fri, Nov 22, 2019 at 03:10:20PM -0800, Brian Norris wrote:
> Hi! I'm using git 2.24 (or, a variant of that packaged my distro -- I
> can try to build my own if this is deemed not reproducible), and I
> feel like I've been seeing a regression here:
> 
> Previously, when reverting multiple commits (with the default --edit
> behavior), as soon as I'm done editing the first revert commit
> message, the second revert commit pops up an editor, and I'm on my
> way. Now, it seems to require an extra 'git revert --continue' prompt
> in between, yet it doesn't actually recommend that. This is highly
> confusing, and seemingly unnecessary.

Thanks for the report, this is indeed a regression in v2.24.0: it
bisects down to a47ba3c777 (rebase -i: check for updated todo after
squash and reword, 2019-08-19) [Cc'ing Phillip].  It's not specific to
'git revert', but with a slight variation affects 'git cherry-pick' as
well.

> An annotated transcript provided below.

I transcribed your transcript into tests that can be run in our test
framework and demonstrate this regression:

  --- >8 ---

#!/bin/sh

test_description='test'

. ./test-lib.sh

test_expect_success "Brian's revert regression" '
	test_create_repo revert &&
	(
		cd revert &&

		echo 1 >file &&
		git add file &&
		git commit -m first &&
		echo 2 >file &&
		git commit -am second &&
		echo 3 >file &&
		git commit -am third &&

		git checkout -b branch &&

		git revert --edit HEAD HEAD^ &&

		echo 1 >expect &&
		test_cmp expect file
	)
'

test_expect_success "a variant of Brian's regression for cherry-pick" '
	test_create_repo cherry-pick &&
	(
		cd cherry-pick &&

		echo 1 >file &&
		git add file &&
		git commit -m first &&
		echo 2 >file &&
		git commit -am second &&
		echo 3 >file &&
		git commit -am third &&

		git checkout -b branch HEAD^^ &&

		git cherry-pick --edit master^ master &&

		echo 3 >expect &&
		test_cmp expect file
	)
'

test_done

  ---  >8  ---

They both succeed on a47ba3c777's parent, but fail on a47ba3c777 when
the 'git revert' or 'git cherry-pick' commands return with exit code 1
after reverting/cherry-picking the first commit instead of processing
the second commit:

  + git revert --edit HEAD HEAD^
  [branch 88ea48c] Revert "third"
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+), 1 deletion(-)
  On branch branch
  Revert currently in progress.
  
  nothing to commit, working tree clean
  error: last command exited with $?=1
  not ok 1 - Brian's revert regression



  + git cherry-pick --edit master^ master
  [branch 2cb3f74] second
   Author: A U Thor <author@example.com>
   Date: Sat Nov 23 00:17:32 2019 +0000
   1 file changed, 1 insertion(+), 1 deletion(-)
  On branch branch
  Cherry-pick currently in progress.
  
  nothing to commit, working tree clean
  The previous cherry-pick is now empty, possibly due to conflict
  resolution.
  If you wish to commit it anyway, use:
  
      git commit --allow-empty
  
  If you wish to skip this commit, use:
  
      git reset
  
  Then "git cherry-pick --continue" will resume cherry-picking
  the remaining commits.
  error: last command exited with $?=1
  not ok 2 - a variant of Brian's regression for cherry-pick


These test should probably be squeezed into
't3508-cherry-pick-many-commits.sh', but Friday has just turned into
Saturday around here, so that's enough from me for now.


> Note that none of this happens with --no-edit; my reverts happen
> smoothly, with no extra need for --continue.
> 
> Regards,
> Brian
> 
> $ mkdir tmp
> $ cd tmp
> /tmp$ git init
> Initialized empty Git repository in [...]/tmp/.git/
> /tmp$ touch foo
> /tmp$ git add foo
> /tmp$ echo bar >> foo
> /tmp$ git commit -am baz
> [master (root-commit) a388f78d9013] baz
>  1 file changed, 1 insertion(+)
>  create mode 100644 foo
> /tmp$ echo pow >> foo
> /tmp$ git commit -am whizzbang
> [master 51753222dd9a] whizzbang
>  1 file changed, 1 insertion(+)
> /tmp$ echo pop >> foo
> /tmp$ git commit -am nothing
> [master 7153607b11e0] nothing
>  1 file changed, 1 insertion(+)
> /tmp$ git revert HEAD HEAD^
> ## EDITOR pops up, as expected
> [master 586469974ec2] Revert "nothing"
>  1 file changed, 1 deletion(-)
> On branch master
> Revert currently in progress.
> 
> nothing to commit, working tree clean
> ## Unexpected, confusing pause? No prompt to '--continue'
> /tmp$ git log --oneline
> 586469974ec2 (HEAD -> master) Revert "nothing"
> 7153607b11e0 nothing
> 51753222dd9a whizzbang
> a388f78d9013 baz
> /tmp$ git status
> On branch master
> Revert currently in progress.
>   (run "git revert --continue" to continue)
>   (use "git revert --skip" to skip this patch)
>   (use "git revert --abort" to cancel the revert operation)
> 
> nothing to commit, working tree clean
> /tmp$ git revert --continue
> ## EDITOR pops up, as expected
> [master b8dd23f61d07] Revert "whizzbang"
>  1 file changed, 1 deletion(-)
> On branch master
> Revert currently in progress.
> 
> nothing to commit, working tree clean
> ## Unexpected state; both reverts happened, but revert is still in progress?
> /tmp$ git log --oneline
> b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
> 586469974ec2 Revert "nothing"
> 7153607b11e0 nothing
> 51753222dd9a whizzbang
> a388f78d9013 baz
> /tmp$ git status
> On branch master
> Revert currently in progress.
>   (run "git revert --continue" to continue)
>   (use "git revert --skip" to skip this patch)
>   (use "git revert --abort" to cancel the revert operation)
> 
> nothing to commit, working tree clean
> ## OK...I'll run it one more time.
> /tmp$ git revert --continue
> /tmp$ git status
> On branch master
> nothing to commit, working tree clean
> ## *Now* I'm done
> /tmp$ git log --oneline
> b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
> 586469974ec2 Revert "nothing"
> 7153607b11e0 nothing
> 51753222dd9a whizzbang
> a388f78d9013 baz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: git 2.24: git revert <commit1> <commit2> requires extra '--continue'?
  2019-11-23  0:34 ` SZEDER Gábor
@ 2019-11-23  9:53   ` Phillip Wood
  2019-11-23 17:20     ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick SZEDER Gábor
  0 siblings, 1 reply; 16+ messages in thread
From: Phillip Wood @ 2019-11-23  9:53 UTC (permalink / raw)
  To: SZEDER Gábor, Brian Norris; +Cc: git, Phillip Wood

Hi Brian & Gábor

Thanks to Brian for reporting this and Gábor for bisecting and creating 
the tests.

On 23/11/2019 00:34, SZEDER Gábor wrote:
> On Fri, Nov 22, 2019 at 03:10:20PM -0800, Brian Norris wrote:
>> Hi! I'm using git 2.24 (or, a variant of that packaged my distro -- I
>> can try to build my own if this is deemed not reproducible), and I
>> feel like I've been seeing a regression here:
>>
>> Previously, when reverting multiple commits (with the default --edit
>> behavior), as soon as I'm done editing the first revert commit
>> message, the second revert commit pops up an editor, and I'm on my
>> way. Now, it seems to require an extra 'git revert --continue' prompt
>> in between, yet it doesn't actually recommend that. This is highly
>> confusing, and seemingly unnecessary.
> 
> Thanks for the report, this is indeed a regression in v2.24.0: it
> bisects down to a47ba3c777 (rebase -i: check for updated todo after
> squash and reword, 2019-08-19) [Cc'ing Phillip].  It's not specific to
> 'git revert', but with a slight variation affects 'git cherry-pick' as
> well.

Thanks, I suspect it's missing an 'if (is_rebase_i(opts))' I'll take a 
look later and come up with a fix

Best Wishes

Phillip

>> An annotated transcript provided below.
> 
> I transcribed your transcript into tests that can be run in our test
> framework and demonstrate this regression:
> 
>    --- >8 ---
> 
> #!/bin/sh
> 
> test_description='test'
> 
> . ./test-lib.sh
> 
> test_expect_success "Brian's revert regression" '
> 	test_create_repo revert &&
> 	(
> 		cd revert &&
> 
> 		echo 1 >file &&
> 		git add file &&
> 		git commit -m first &&
> 		echo 2 >file &&
> 		git commit -am second &&
> 		echo 3 >file &&
> 		git commit -am third &&
> 
> 		git checkout -b branch &&
> 
> 		git revert --edit HEAD HEAD^ &&
> 
> 		echo 1 >expect &&
> 		test_cmp expect file
> 	)
> '
> 
> test_expect_success "a variant of Brian's regression for cherry-pick" '
> 	test_create_repo cherry-pick &&
> 	(
> 		cd cherry-pick &&
> 
> 		echo 1 >file &&
> 		git add file &&
> 		git commit -m first &&
> 		echo 2 >file &&
> 		git commit -am second &&
> 		echo 3 >file &&
> 		git commit -am third &&
> 
> 		git checkout -b branch HEAD^^ &&
> 
> 		git cherry-pick --edit master^ master &&
> 
> 		echo 3 >expect &&
> 		test_cmp expect file
> 	)
> '
> 
> test_done
> 
>    ---  >8  ---
> 
> They both succeed on a47ba3c777's parent, but fail on a47ba3c777 when
> the 'git revert' or 'git cherry-pick' commands return with exit code 1
> after reverting/cherry-picking the first commit instead of processing
> the second commit:
> 
>    + git revert --edit HEAD HEAD^
>    [branch 88ea48c] Revert "third"
>     Author: A U Thor <author@example.com>
>     1 file changed, 1 insertion(+), 1 deletion(-)
>    On branch branch
>    Revert currently in progress.
>    
>    nothing to commit, working tree clean
>    error: last command exited with $?=1
>    not ok 1 - Brian's revert regression
> 
> 
> 
>    + git cherry-pick --edit master^ master
>    [branch 2cb3f74] second
>     Author: A U Thor <author@example.com>
>     Date: Sat Nov 23 00:17:32 2019 +0000
>     1 file changed, 1 insertion(+), 1 deletion(-)
>    On branch branch
>    Cherry-pick currently in progress.
>    
>    nothing to commit, working tree clean
>    The previous cherry-pick is now empty, possibly due to conflict
>    resolution.
>    If you wish to commit it anyway, use:
>    
>        git commit --allow-empty
>    
>    If you wish to skip this commit, use:
>    
>        git reset
>    
>    Then "git cherry-pick --continue" will resume cherry-picking
>    the remaining commits.
>    error: last command exited with $?=1
>    not ok 2 - a variant of Brian's regression for cherry-pick
> 
> 
> These test should probably be squeezed into
> 't3508-cherry-pick-many-commits.sh', but Friday has just turned into
> Saturday around here, so that's enough from me for now.
> 
> 
>> Note that none of this happens with --no-edit; my reverts happen
>> smoothly, with no extra need for --continue.
>>
>> Regards,
>> Brian
>>
>> $ mkdir tmp
>> $ cd tmp
>> /tmp$ git init
>> Initialized empty Git repository in [...]/tmp/.git/
>> /tmp$ touch foo
>> /tmp$ git add foo
>> /tmp$ echo bar >> foo
>> /tmp$ git commit -am baz
>> [master (root-commit) a388f78d9013] baz
>>   1 file changed, 1 insertion(+)
>>   create mode 100644 foo
>> /tmp$ echo pow >> foo
>> /tmp$ git commit -am whizzbang
>> [master 51753222dd9a] whizzbang
>>   1 file changed, 1 insertion(+)
>> /tmp$ echo pop >> foo
>> /tmp$ git commit -am nothing
>> [master 7153607b11e0] nothing
>>   1 file changed, 1 insertion(+)
>> /tmp$ git revert HEAD HEAD^
>> ## EDITOR pops up, as expected
>> [master 586469974ec2] Revert "nothing"
>>   1 file changed, 1 deletion(-)
>> On branch master
>> Revert currently in progress.
>>
>> nothing to commit, working tree clean
>> ## Unexpected, confusing pause? No prompt to '--continue'
>> /tmp$ git log --oneline
>> 586469974ec2 (HEAD -> master) Revert "nothing"
>> 7153607b11e0 nothing
>> 51753222dd9a whizzbang
>> a388f78d9013 baz
>> /tmp$ git status
>> On branch master
>> Revert currently in progress.
>>    (run "git revert --continue" to continue)
>>    (use "git revert --skip" to skip this patch)
>>    (use "git revert --abort" to cancel the revert operation)
>>
>> nothing to commit, working tree clean
>> /tmp$ git revert --continue
>> ## EDITOR pops up, as expected
>> [master b8dd23f61d07] Revert "whizzbang"
>>   1 file changed, 1 deletion(-)
>> On branch master
>> Revert currently in progress.
>>
>> nothing to commit, working tree clean
>> ## Unexpected state; both reverts happened, but revert is still in progress?
>> /tmp$ git log --oneline
>> b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
>> 586469974ec2 Revert "nothing"
>> 7153607b11e0 nothing
>> 51753222dd9a whizzbang
>> a388f78d9013 baz
>> /tmp$ git status
>> On branch master
>> Revert currently in progress.
>>    (run "git revert --continue" to continue)
>>    (use "git revert --skip" to skip this patch)
>>    (use "git revert --abort" to cancel the revert operation)
>>
>> nothing to commit, working tree clean
>> ## OK...I'll run it one more time.
>> /tmp$ git revert --continue
>> /tmp$ git status
>> On branch master
>> nothing to commit, working tree clean
>> ## *Now* I'm done
>> /tmp$ git log --oneline
>> b8dd23f61d07 (HEAD -> master) Revert "whizzbang"
>> 586469974ec2 Revert "nothing"
>> 7153607b11e0 nothing
>> 51753222dd9a whizzbang
>> a388f78d9013 baz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-23  9:53   ` Phillip Wood
@ 2019-11-23 17:20     ` SZEDER Gábor
  2019-11-23 21:14       ` Johannes Schindelin
  2019-11-24  4:49       ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-23 17:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Norris, Phillip Wood, SZEDER Gábor

When 'git revert' or 'git cherry-pick --edit' is invoked with multiple
commits, then after editing the first commit message is finished both
these commands should continue with processing the second commit and
launch another editor for its commit message, assuming there are
no conflicts, of course.

Alas, this inadvertently changed with commit a47ba3c777 (rebase -i:
check for updated todo after squash and reword, 2019-08-19): after
editing the first commit message is finished, both 'git revert' and
'git cherry-pick --edit' exit with error, claiming that "nothing to
commit, working tree clean".

The reason for the changed behaviour is twofold:

  - Prior to a47ba3c777 the up-to-dateness of the todo list file was
    only checked after 'exec' instructions, and that commit moved
    those checks to the common code path.  The intention was that this
    check should be performed after instructions spawning an editor
    ('squash' and 'reword') as well, so the ongoing 'rebase -i'
    notices when the user runs a 'git rebase --edit-todo' while
    squashing/rewording a commit message.

    However, as it happened that check is now performed even after
    'revert' and 'pick' instructions when they involved editing the
    commit message.  And 'revert' by default while 'pick' optionally
    (with 'git cherry-pick --edit') involves editing the commit
    message.

  - When invoking 'git revert' or 'git cherry-pick --edit' with
    multiple commits they don't read a todo list file but assemble the
    todo list in memory, thus the associated stat data used to check
    whether the file has been updated is all zeroed out initially.

    Then the sequencer writes all instructions (including the very
    first) to the todo file, executes the first 'revert/pick'
    instruction, and after the user finished editing the commit
    message the changes of a47ba3c777 kick in, and it checks whether
    the todo file has been modified.  The initial all-zero stat data
    obviously differs from the todo file's current stat data, so the
    sequencer concludes that the file has been modified.  Technically
    it is not wrong, of course, because the file just has been written
    indeed by the sequencer itself, though the file's contents still
    match what the sequencer was invoked with in the beginning.
    Consequently, after re-reading the todo file the sequencer
    executes the same first instruction _again_, thus ending up in
    that "nothing to commit" situation.

The todo list was never meant to be edited during multi-commit 'git
revert' or 'cherry-pick' operations, so perform that "has the todo
file been modified" check only when the sequencer was invoked as part
of an interactive rebase.

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

On Sat, Nov 23, 2019 at 09:53:51AM +0000, Phillip Wood wrote:
> Thanks, I suspect it's missing an 'if (is_rebase_i(opts))' I'll take a look
> later and come up with a fix

That missing condition was my hunch yesterday evening as well, and
while it did fix my tests and didn't break anything else, it took some
time to wrap my head around some of the subtleties that are going on
in the sequencer.  That's why the commit message ended up this long as
it did.

In the end I decided to add the new tests to
't3429-rebase-edit-todo.sh', because, though these new tests don't
actually check 'rebase', that is the one test script focusing on
(re-)reading the todo file in particular.

 sequencer.c                 |  2 +-
 t/t3429-rebase-edit-todo.sh | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 2adcf5a639..3b05d0277d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3791,7 +3791,7 @@ static int pick_commits(struct repository *r,
 							item->commit,
 							arg, item->arg_len,
 							opts, res, 0);
-		} else if (check_todo && !res) {
+		} else if (is_rebase_i(opts) && check_todo && !res) {
 			struct stat st;
 
 			if (stat(get_todo_path(opts), &st)) {
diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index 8739cb60a7..1679f2563d 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -52,4 +52,34 @@ test_expect_success 'todo is re-read after reword and squash' '
 	test_cmp expected actual
 '
 
+test_expect_success 're-reading todo doesnt interfere with revert --edit' '
+	git reset --hard third &&
+
+	git revert --edit third second &&
+
+	cat >expect <<-\EOF &&
+	Revert "second"
+	Revert "third"
+	third
+	second
+	first
+	EOF
+	git log --format="%s" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 're-reading todo doesnt interfere with cherry-pick --edit' '
+	git reset --hard first &&
+
+	git cherry-pick --edit second third &&
+
+	cat >expect <<-\EOF &&
+	third
+	second
+	first
+	EOF
+	git log --format="%s" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.24.0.532.ge18579ded8


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-23 17:20     ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick SZEDER Gábor
@ 2019-11-23 21:14       ` Johannes Schindelin
  2019-11-24  4:49       ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2019-11-23 21:14 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git, Brian Norris, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 5546 bytes --]

Hi,

On Sat, 23 Nov 2019, SZEDER Gábor wrote:

> When 'git revert' or 'git cherry-pick --edit' is invoked with multiple
> commits, then after editing the first commit message is finished both
> these commands should continue with processing the second commit and
> launch another editor for its commit message, assuming there are
> no conflicts, of course.
>
> Alas, this inadvertently changed with commit a47ba3c777 (rebase -i:
> check for updated todo after squash and reword, 2019-08-19): after
> editing the first commit message is finished, both 'git revert' and
> 'git cherry-pick --edit' exit with error, claiming that "nothing to
> commit, working tree clean".
>
> The reason for the changed behaviour is twofold:
>
>   - Prior to a47ba3c777 the up-to-dateness of the todo list file was
>     only checked after 'exec' instructions, and that commit moved
>     those checks to the common code path.  The intention was that this
>     check should be performed after instructions spawning an editor
>     ('squash' and 'reword') as well, so the ongoing 'rebase -i'
>     notices when the user runs a 'git rebase --edit-todo' while
>     squashing/rewording a commit message.
>
>     However, as it happened that check is now performed even after
>     'revert' and 'pick' instructions when they involved editing the
>     commit message.  And 'revert' by default while 'pick' optionally
>     (with 'git cherry-pick --edit') involves editing the commit
>     message.
>
>   - When invoking 'git revert' or 'git cherry-pick --edit' with
>     multiple commits they don't read a todo list file but assemble the
>     todo list in memory, thus the associated stat data used to check
>     whether the file has been updated is all zeroed out initially.
>
>     Then the sequencer writes all instructions (including the very
>     first) to the todo file, executes the first 'revert/pick'
>     instruction, and after the user finished editing the commit
>     message the changes of a47ba3c777 kick in, and it checks whether
>     the todo file has been modified.  The initial all-zero stat data
>     obviously differs from the todo file's current stat data, so the
>     sequencer concludes that the file has been modified.  Technically
>     it is not wrong, of course, because the file just has been written
>     indeed by the sequencer itself, though the file's contents still
>     match what the sequencer was invoked with in the beginning.
>     Consequently, after re-reading the todo file the sequencer
>     executes the same first instruction _again_, thus ending up in
>     that "nothing to commit" situation.
>
> The todo list was never meant to be edited during multi-commit 'git
> revert' or 'cherry-pick' operations, so perform that "has the todo
> file been modified" check only when the sequencer was invoked as part
> of an interactive rebase.
>
> Reported-by: Brian Norris <briannorris@chromium.org>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---

The patch and the commit message (thank you so much for the detail!) look
very good to me.

> On Sat, Nov 23, 2019 at 09:53:51AM +0000, Phillip Wood wrote:
> > Thanks, I suspect it's missing an 'if (is_rebase_i(opts))' I'll take a look
> > later and come up with a fix
>
> That missing condition was my hunch yesterday evening as well, and
> while it did fix my tests and didn't break anything else, it took some
> time to wrap my head around some of the subtleties that are going on
> in the sequencer.  That's why the commit message ended up this long as
> it did.
>
> In the end I decided to add the new tests to
> 't3429-rebase-edit-todo.sh', because, though these new tests don't
> actually check 'rebase', that is the one test script focusing on
> (re-)reading the todo file in particular.

Yup, makes sense.

Thanks,
Dscho

>
>  sequencer.c                 |  2 +-
>  t/t3429-rebase-edit-todo.sh | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 2adcf5a639..3b05d0277d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3791,7 +3791,7 @@ static int pick_commits(struct repository *r,
>  							item->commit,
>  							arg, item->arg_len,
>  							opts, res, 0);
> -		} else if (check_todo && !res) {
> +		} else if (is_rebase_i(opts) && check_todo && !res) {
>  			struct stat st;
>
>  			if (stat(get_todo_path(opts), &st)) {
> diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
> index 8739cb60a7..1679f2563d 100755
> --- a/t/t3429-rebase-edit-todo.sh
> +++ b/t/t3429-rebase-edit-todo.sh
> @@ -52,4 +52,34 @@ test_expect_success 'todo is re-read after reword and squash' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 're-reading todo doesnt interfere with revert --edit' '
> +	git reset --hard third &&
> +
> +	git revert --edit third second &&
> +
> +	cat >expect <<-\EOF &&
> +	Revert "second"
> +	Revert "third"
> +	third
> +	second
> +	first
> +	EOF
> +	git log --format="%s" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 're-reading todo doesnt interfere with cherry-pick --edit' '
> +	git reset --hard first &&
> +
> +	git cherry-pick --edit second third &&
> +
> +	cat >expect <<-\EOF &&
> +	third
> +	second
> +	first
> +	EOF
> +	git log --format="%s" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> --
> 2.24.0.532.ge18579ded8
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-23 17:20     ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick SZEDER Gábor
  2019-11-23 21:14       ` Johannes Schindelin
@ 2019-11-24  4:49       ` Junio C Hamano
  2019-11-24 10:44         ` Phillip Wood
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-11-24  4:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Brian Norris, Phillip Wood

SZEDER Gábor <szeder.dev@gmail.com> writes:

> When 'git revert' or 'git cherry-pick --edit' is invoked with multiple
> commits, then after editing the first commit message is finished both

"commits, then after editing the first commit message, both of" I
would say.

> these commands should continue with processing the second commit and
> launch another editor for its commit message, assuming there are
> no conflicts, of course.
>
> Alas, this inadvertently changed with commit a47ba3c777 (rebase -i:
> check for updated todo after squash and reword, 2019-08-19): after
> editing the first commit message is finished, both 'git revert' and
> 'git cherry-pick --edit' exit with error, claiming that "nothing to
> commit, working tree clean".
> ...
>   - When invoking 'git revert' or 'git cherry-pick --edit' with
>     multiple commits they don't read a todo list file but assemble the
>     todo list in memory, thus the associated stat data used to check
>     whether the file has been updated is all zeroed out initially.
>
>     Then the sequencer writes all instructions (including the very
>     first) to the todo file, executes the first 'revert/pick'
>     instruction, and after the user finished editing the commit
>     message the changes of a47ba3c777 kick in, and it checks whether
>     the todo file has been modified.  The initial all-zero stat data
>     obviously differs from the todo file's current stat data, so the
>     sequencer concludes that the file has been modified.  Technically
>     it is not wrong, of course, because the file just has been written
>     indeed by the sequencer itself, though the file's contents still
>     match what the sequencer was invoked with in the beginning.
>     Consequently, after re-reading the todo file the sequencer
>     executes the same first instruction _again_, thus ending up in
>     that "nothing to commit" situation.

Hmph, that makes it sound as if the right fix is to re-read after
writing the first version of the todo file out, so that the stat
data matches reality and tells us that it has never been modified?

> The todo list was never meant to be edited during multi-commit 'git
> revert' or 'cherry-pick' operations, so perform that "has the todo
> file been modified" check only when the sequencer was invoked as part
> of an interactive rebase.

OK.  That is a valid fix, I think, but the explanation given in the
second bullet point gives a wrong impression that it is merely a
workaround (iow, we are assuming that the user would behave, instead
of making sure we can detect when the user touches the list), when
it is *not*.

> diff --git a/sequencer.c b/sequencer.c
> index 2adcf5a639..3b05d0277d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3791,7 +3791,7 @@ static int pick_commits(struct repository *r,
>  							item->commit,
>  							arg, item->arg_len,
>  							opts, res, 0);
> -		} else if (check_todo && !res) {
> +		} else if (is_rebase_i(opts) && check_todo && !res) {

It is a bit sad that setting of check_todo is not something a single
helper function can decide, so that this is_rebase_i(opts) can be
taken into account when that helper function (the logical place
would be do_pick_commit()) decides to set (or not set) check_todo.

Unfortunately, that is not sufficient, I suspect.  Why did a47ba3c7
("rebase -i: check for updated todo after squash and reword",
2019-08-19) decide to flip check_todo on when running TODO_EXEC?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-24  4:49       ` Junio C Hamano
@ 2019-11-24 10:44         ` Phillip Wood
  2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
  2019-11-25  1:10           ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Phillip Wood @ 2019-11-24 10:44 UTC (permalink / raw)
  To: Junio C Hamano, SZEDER Gábor; +Cc: git, Brian Norris, Phillip Wood

On 24/11/2019 04:49, Junio C Hamano wrote:

Thanks for working on this Gábor

> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
>> When 'git revert' or 'git cherry-pick --edit' is invoked with multiple
>> commits, then after editing the first commit message is finished both
> 
> "commits, then after editing the first commit message, both of" I
> would say.
> 
>> these commands should continue with processing the second commit and
>> launch another editor for its commit message, assuming there are
>> no conflicts, of course.
>>
>> Alas, this inadvertently changed with commit a47ba3c777 (rebase -i:
>> check for updated todo after squash and reword, 2019-08-19): after
>> editing the first commit message is finished, both 'git revert' and
>> 'git cherry-pick --edit' exit with error, claiming that "nothing to
>> commit, working tree clean".
>> ...
>>    - When invoking 'git revert' or 'git cherry-pick --edit' with
>>      multiple commits they don't read a todo list file but assemble the
>>      todo list in memory, thus the associated stat data used to check
>>      whether the file has been updated is all zeroed out initially.
>>
>>      Then the sequencer writes all instructions (including the very
>>      first) to the todo file, executes the first 'revert/pick'
>>      instruction, and after the user finished editing the commit
>>      message the changes of a47ba3c777 kick in, and it checks whether
>>      the todo file has been modified.  The initial all-zero stat data
>>      obviously differs from the todo file's current stat data, so the
>>      sequencer concludes that the file has been modified.  Technically
>>      it is not wrong, of course, because the file just has been written
>>      indeed by the sequencer itself, though the file's contents still
>>      match what the sequencer was invoked with in the beginning.
>>      Consequently, after re-reading the todo file the sequencer
>>      executes the same first instruction _again_, thus ending up in
>>      that "nothing to commit" situation.
> 
> Hmph, that makes it sound as if the right fix is to re-read after
> writing the first version of the todo file out, so that the stat
> data matches reality and tells us that it has never been modified?

I think we should update the stat data after we write the todo list. 
ag/sequencer-todo-updates changes the rebase startup sequence to avoid 
the initial read that populates the stat data. There's a subtle 
difference in the todo list handling between rabese and 
cherry-pick/revert. The list written by rebase does not include the 
commit currently being picked but it does for cherry-pick/revert. This 
means that rebase is not subject to this bug in 
ag/sequencer-todo-updates as although it re-reads the todo list because 
the stat data is zero the list does not contain the commit we've just 
picked.

> 
>> The todo list was never meant to be edited during multi-commit 'git
>> revert' or 'cherry-pick' operations, so perform that "has the todo
>> file been modified" check only when the sequencer was invoked as part
>> of an interactive rebase.
> 
> OK.  That is a valid fix, I think, but the explanation given in the
> second bullet point gives a wrong impression that it is merely a
> workaround (iow, we are assuming that the user would behave, instead
> of making sure we can detect when the user touches the list), when
> it is *not*.
> 

I'd be happier not to set check_todo in the first place unless we're 
rebasing (though one could argue that Gábor's version is more future-proof)

>> diff --git a/sequencer.c b/sequencer.c
>> index 2adcf5a639..3b05d0277d 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -3791,7 +3791,7 @@ static int pick_commits(struct repository *r,
>>   							item->commit,
>>   							arg, item->arg_len,
>>   							opts, res, 0);
>> -		} else if (check_todo && !res) {
>> +		} else if (is_rebase_i(opts) && check_todo && !res) {
> 
> It is a bit sad that setting of check_todo is not something a single
> helper function can decide, so that this is_rebase_i(opts) can be
> taken into account when that helper function (the logical place
> would be do_pick_commit()) decides to set (or not set) check_todo.
> 
> Unfortunately, that is not sufficient, I suspect.  Why did a47ba3c7
> ("rebase -i: check for updated todo after squash and reword",
> 2019-08-19) decide to flip check_todo on when running TODO_EXEC?

I'm not sure what you mean by this

This is what I had before I saw Gábor's patch (the tests are pretty 
similar but I think we should check that the messages of all the picks 
are actually edited with --edit - that does not seem to be checked by 
the current tests) Note this does not include updating the stat data 
after writing the todo list yet as I've only just thought about that.
Gábor are you happy to take this forward?

Best Wishes

Phillip


--- 8> ---

diff --git a/sequencer.c b/sequencer.c
index f2abe2a366..b363b45366 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1999,7 +1999,7 @@ static int do_pick_commit(struct repository *r,
                         res = do_commit(r, msg_file, author, opts, flags);
                 else
                         res = error(_("unable to parse commit author"));
-               *check_todo = !!(flags & EDIT_MSG);
+               *check_todo = is_rebase_i(opts) && !!(flags & EDIT_MSG);
                 if (!res && reword) {
  fast_forward_edit:
                         res = run_git_commit(r, NULL, opts, EDIT_MSG |
diff --git a/t/t3508-cherry-pick-many-commits.sh 
b/t/t3508-cherry-pick-many-commits.sh
index b457333e18..fd9d626779 100755
--- a/t/t3508-cherry-pick-many-commits.sh
+++ b/t/t3508-cherry-pick-many-commits.sh
@@ -40,6 +40,31 @@ test_expect_success 'cherry-pick first..fourth works' '
         check_head_differs_from fourth
  '

+test_expect_success 'cherry-pick --edit first..fourth works' '
+       git checkout -f master &&
+       git reset --hard first &&
+       test_tick &&
+       GIT_EDITOR="echo edited >>" git cherry-pick --edit first..fourth &&
+       git log --pretty=format:%B -3 >actual &&
+       cat >expect <<-\EOF &&
+       fourth
+
+       edited
+
+       third
+
+       edited
+
+       second
+
+       edited
+       EOF
+       test_cmp expect actual &&
+       git diff --quiet other &&
+       git diff --quiet HEAD other &&
+       check_head_differs_from fourth
+'
+
  test_expect_success 'cherry-pick three one two works' '
         git checkout -f first &&
         test_commit one &&
@@ -153,6 +178,37 @@ test_expect_success 'revert first..fourth works' '
         git diff --quiet HEAD first
  '

+test_expect_success 'revert --edit first..fourth works' '
+       git checkout -f master &&
+       git reset --hard fourth &&
+       test_tick &&
+       GIT_EDITOR="echo edited >>" git revert --edit first..fourth &&
+       git log --pretty=format:%B -3 >actual &&
+       cat >expect <<-EOF &&
+       Revert "second"
+
+       This reverts commit $(git rev-parse second).
+
+       edited
+
+       Revert "third"
+
+       This reverts commit $(git rev-parse third).
+
+       edited
'

+test_expect_success 'revert --edit first..fourth works' '
+       git checkout -f master &&
+       git reset --hard fourth &&
+       test_tick &&
+       GIT_EDITOR="echo edited >>" git revert --edit first..fourth &&
+       git log --pretty=format:%B -3 >actual &&
+       cat >expect <<-EOF &&
+       Revert "second"
+
+       This reverts commit $(git rev-parse second).
+
+       edited
+
+       Revert "third"
+
+       This reverts commit $(git rev-parse third).
+
+       edited
+
+       Revert "fourth"
+
+       This reverts commit $(git rev-parse fourth).
+
+       edited
+       EOF
+       test_cmp expect actual &&
+       git diff --quiet first &&
+       git diff --cached --quiet first &&
+       git diff --quiet HEAD first
+'
+
  test_expect_success 'revert ^first fourth works' '
         git checkout -f master &&
         git reset --hard fourth &&




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-24 10:44         ` Phillip Wood
@ 2019-11-24 21:10           ` SZEDER Gábor
  2019-11-25  1:28             ` Junio C Hamano
                               ` (2 more replies)
  2019-11-25  1:10           ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick Junio C Hamano
  1 sibling, 3 replies; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-24 21:10 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, git, Brian Norris, Johannes Schindelin

On Sun, Nov 24, 2019 at 10:44:10AM +0000, Phillip Wood wrote:
> On 24/11/2019 04:49, Junio C Hamano wrote:
> 
> Thanks for working on this Gábor
> 
> >SZEDER Gábor <szeder.dev@gmail.com> writes:
> >
> >>When 'git revert' or 'git cherry-pick --edit' is invoked with multiple
> >>commits, then after editing the first commit message is finished both
> >
> >"commits, then after editing the first commit message, both of" I
> >would say.
> >
> >>these commands should continue with processing the second commit and
> >>launch another editor for its commit message, assuming there are
> >>no conflicts, of course.
> >>
> >>Alas, this inadvertently changed with commit a47ba3c777 (rebase -i:
> >>check for updated todo after squash and reword, 2019-08-19): after
> >>editing the first commit message is finished, both 'git revert' and
> >>'git cherry-pick --edit' exit with error, claiming that "nothing to
> >>commit, working tree clean".
> >>...
> >>   - When invoking 'git revert' or 'git cherry-pick --edit' with
> >>     multiple commits they don't read a todo list file but assemble the
> >>     todo list in memory, thus the associated stat data used to check
> >>     whether the file has been updated is all zeroed out initially.
> >>
> >>     Then the sequencer writes all instructions (including the very
> >>     first) to the todo file, executes the first 'revert/pick'
> >>     instruction, and after the user finished editing the commit
> >>     message the changes of a47ba3c777 kick in, and it checks whether
> >>     the todo file has been modified.  The initial all-zero stat data
> >>     obviously differs from the todo file's current stat data, so the
> >>     sequencer concludes that the file has been modified.  Technically
> >>     it is not wrong, of course, because the file just has been written
> >>     indeed by the sequencer itself, though the file's contents still
> >>     match what the sequencer was invoked with in the beginning.
> >>     Consequently, after re-reading the todo file the sequencer
> >>     executes the same first instruction _again_, thus ending up in
> >>     that "nothing to commit" situation.
> >
> >Hmph, that makes it sound as if the right fix is to re-read after
> >writing the first version of the todo file out, so that the stat
> >data matches reality and tells us that it has never been modified?
> 
> I think we should update the stat data after we write the todo list.

Well, yes and no.

No, because we are dealing with regression in v2.24.0 here, so the
simpler the fix the better it is for maint.  I don't think a fix can
get any simpler than my patch, with or without the suggestions from
Phillip.

Yes, we should definitely consider updating the stat data after the
sequencer writes the todo list, or any other options with which the
sequencer could notice a modified todo list file with less subtlety.
Alas, there is a big can of worms in that direction, see the patch
below, and we have to be very careful going that way, so I think it's
only viable in the long term, but less suitable as a regression fix
for maint.

(Hrm, perhaps I spent too many words on the all zeroed out stat data,
and managed to sidetrack you a bit...)

  ---  >8  ---

Subject: [PATCH] t3429: try to protect against a potential racy todo file
 problem

During an interactive rebase session the user can run basically any
commands with an 'exec' instruction, including commands that modify
the todo list of the ongoing rebase.  Similarly, the user might choose
to modify the todo list while editing a commit message for a 'reword'
or a 'squash' instruction.  The ongoing interactive rebase process
should be able to notice the modified todo file and re-read it to
follow the updated instructions.  This has been working well for the
'exec' instruction for some time, but for 'reword' and 'squash' it has
been fixed only recently in a47ba3c777 (rebase -i: check for updated
todo after squash and reword, 2019-08-19).

To notice a changed todo file the sequencer stores the file's stat
data in its 'struct todo_list' instance, and compares it with the
file's current stat data after 'reword', 'squash' and 'exec'
instructions.  If the two stat data doesn't match, it re-reads the
todo file.

Sounds simple, but there are some subtleties going on here:

  - The 'struct todo_list' holds the stat data from the time when the
    todo file was last read.

  - This stat data in 'struct todo_list' is not updated when the
    sequencer itself writes the todo file.

  - Before executing each instruction during an interactive rebase,
    the sequencer always updates the todo file by removing the
    just-about-to-be-executed instruction.  This changes the file's
    size and inode [1].

Consequently, when the sequencer looks at the stat data after a
'reword', 'squash' or 'exec' instruction, it most likely finds that
they differ, even when the user didn't modify the todo list at all!
This is not an issue in practice, it just wastes a few cycles on
re-reading the todo list that matches what the sequencer already has
in memory anyway.

However, an unsuspecting Git developer might try to "fix" it by simply
updating the stat data each time the sequencer writes the todo list
for an interactive rebase.  On first sight it looks quite sensible and
straightforward, but we have to be very careful when doing that,
because potential racy problems lie that way.

It is possible to overwrite the todo list file without modifying
either its inode or size, and if such an overwrite were to happen in
the same second when the file was last read (our stat data has one
second granularity by default), then the actual stat data on the file
system would match the stat data that the sequencer has in memory.
Consequently, such a modification to the todo list file would go
unnoticed.

Add a test to 't3429-rebase-edit-todo.sh' that modifies the todo list
during an interactive rebase in a way that the file's stat data very
likely doesn't change.  Since we have no control over ctime repeat the
test a few times, just like we do in the racy git and racy split index
tests (t0010 and t1701).  This test succeeds right now because of all
the above, but it does fail most of the time if the stat data is
naively updated when writing the todo file [2].  Hopefully this test
will help prevent us from introducing such racy problems if we ever
change how the sequencer looks out for modified todo list files.

This new test only checks what it's supposed to when it doesn't change
the size of the todo file while modifying it, thus it critically
depends on the todo file format.  And the todo file format did change
in the past: early on it stored abbreviated commit object ids, but
since 75c6976655 (rebase -i: fix short SHA-1 collision, 2013-08-23) it
stores full object ids.  Perhaps it will change in the future as well,
so be extra defensive about this, and check that the file size before
and after the edit actually match, and complain with a "bug in the
test script" if they don't.

[1] The todo file is written using the lockfile API, i.e. first to the
    lockfile, which is then moved to place, so the new file can't
    possibly have the same inode as the file it replaces.  Note,
    however, that the file system might reuse inodes, so it is
    possible that the new todo file ends up with the same inode as is
    recorded in the 'struct todo_list' from the last time the file was
    read.

[2] To trigger failure in the new test updating the stat data when
    writing the todo file like the diff below does.  No other test in
    our test suite seems to fail with this change applied.

    diff --git a/sequencer.c b/sequencer.c
    index 3b05d0277d..b2acb7d536 100644
    --- a/sequencer.c
    +++ b/sequencer.c
    @@ -2730,6 +2730,7 @@ static int save_todo(struct todo_list *todo_list, struct replay_opts *opts)
     	struct lock_file todo_lock = LOCK_INIT;
     	const char *todo_path = get_todo_path(opts);
     	int next = todo_list->current, offset, fd;
    +	struct stat st;

     	/*
     	 * rebase -i writes "git-rebase-todo" without the currently executing
    @@ -2748,6 +2749,10 @@ static int save_todo(struct todo_list *todo_list, struct replay_opts *opts)
     	if (commit_lock_file(&todo_lock) < 0)
     		return error(_("failed to finalize '%s'"), todo_path);

    +	if (stat(todo_path, &st) < 0)
    +		return error(_("could not stat '%s'"), todo_path);
    +	fill_stat_data(&todo_list->stat, &st);
    +
     	if (is_rebase_i(opts) && next > 0) {
     		const char *done = rebase_path_done();
     		int fd = open(done, O_CREAT | O_WRONLY | O_APPEND, 0666);

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t3429-rebase-edit-todo.sh | 59 +++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index 8739cb60a7..c80c139b47 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -52,4 +52,63 @@ test_expect_success 'todo is re-read after reword and squash' '
 	test_cmp expected actual
 '
 
+test_expect_success 'setup for racy tests' '
+	write_script sequence-editor <<-EOS &&
+		cat >.git/rebase-merge/git-rebase-todo <<-\EOF
+			r $(git rev-parse second^0) second
+			r $(git rev-parse third^0) third
+		EOF
+	EOS
+
+	write_script commit-editor <<-\EOS &&
+		read first_line <"$1" &&
+		echo "$first_line - edited" >"$1" &&
+
+		todo=.git/rebase-merge/git-rebase-todo &&
+
+		if test "$first_line" = second
+		then
+			old_size=$(wc -c <"$todo") &&
+			# Replace the "reword <full-oid> third" line with
+			# a different instruction of the same line length.
+			# Overwrite the file in-place, so the inode stays
+			# the same as well.
+			cat >"$todo" <<-EOF &&
+				exec echo 0123456789012345678901234 >exec-cmd-was-run
+			EOF
+			new_size=$(wc -c <"$todo") &&
+
+			if test $old_size -ne $new_size
+			then
+				echo >&2 "error: bug in the test script: the size of the todo list must not change"
+				exit 1
+			fi
+		fi
+	EOS
+
+	cat >expect <<-\EOF
+	second - edited
+	first
+	EOF
+'
+
+# This test can give false success if your machine is sufficiently
+# slow or all trials happened to happen on second boundaries.
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "racy todo re-read #$trial" '
+		git reset --hard third &&
+		rm -rf exec-cmd-was-run &&
+
+		GIT_SEQUENCE_EDITOR=./sequence-editor \
+		GIT_EDITOR=./commit-editor \
+		git rebase -i HEAD^^ &&
+
+		test_path_is_file exec-cmd-was-run &&
+		git log --format=%s >actual &&
+		test_cmp expect actual
+	'
+done
+
 test_done
-- 
2.24.0.532.ge18579ded8


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-24 10:44         ` Phillip Wood
  2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
@ 2019-11-25  1:10           ` Junio C Hamano
  2019-11-25 10:47             ` Phillip Wood
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-11-25  1:10 UTC (permalink / raw)
  To: Phillip Wood; +Cc: SZEDER Gábor, git, Brian Norris, Phillip Wood

Phillip Wood <phillip.wood123@gmail.com> writes:

>>> -		} else if (check_todo && !res) {
>>> +		} else if (is_rebase_i(opts) && check_todo && !res) {
>>
>> It is a bit sad that setting of check_todo is not something a single
>> helper function can decide, so that this is_rebase_i(opts) can be
>> taken into account when that helper function (the logical place
>> would be do_pick_commit()) decides to set (or not set) check_todo.
>>
>> Unfortunately, that is not sufficient, I suspect.  Why did a47ba3c7
>> ("rebase -i: check for updated todo after squash and reword",
>> 2019-08-19) decide to flip check_todo on when running TODO_EXEC?
>
> I'm not sure what you mean by this
>
> This is what I had before I saw Gábor's patch (the tests are pretty
> similar but I think we should check that the messages of all the picks
> are actually edited with --edit - that does not seem to be checked by
> the current tests)...

I first thought that unsetting *check_todo in do_pick_commit(), when
!is_rebase_i(), was a clean solution.  But sadly it is *not* a godo
equivalent to Gábor's patch, because check_todo can be set to true
without looking at is_rebase_i() in pick_commits() [*1*].  To ignore
that setting where the variable's value is used, the hunk we see
above in the beginning of this message is necessary.

That was what I meant.  I think the "This is what I had before"
patch matches my "I first thought" version, so we were on the same
page and both wrong ;-)

Thanks.


[Footnote]

*1* I still do not know why a47ba3c7 ("rebase -i: check for updated
todo after squash and reword", 2019-08-19) sets check_todo to true
without looking at is_rebase_i().  If this unconditonal setting in
TODO_EXEC did not exist, I think your "This is what I had before"
patch would have been equivalent to Gábor's patch.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
@ 2019-11-25  1:28             ` Junio C Hamano
  2019-11-25  3:10             ` SZEDER Gábor
  2019-11-25 13:18             ` SZEDER Gábor
  2 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2019-11-25  1:28 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: phillip.wood, git, Brian Norris, Johannes Schindelin

SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Sun, Nov 24, 2019 at 10:44:10AM +0000, Phillip Wood wrote:
>> On 24/11/2019 04:49, Junio C Hamano wrote:
>>  ...
>> >Hmph, that makes it sound as if the right fix is to re-read after
>> >writing the first version of the todo file out, so that the stat
>> >data matches reality and tells us that it has never been modified?
>> 
>> I think we should update the stat data after we write the todo list.
>
> Well, yes and no.
>
> No, because we are dealing with regression in v2.24.0 here, so the
> simpler the fix the better it is for maint.  I don't think a fix can
> get any simpler than my patch, with or without the suggestions from
> Phillip.

Of course, the simplest "fix" for regression is to revert the
offending one, and anything else is a band-aid ;-).  The question is
which band-aid is the least risky and which one takes us the closest
to the real solution.  I tend to agree that forcing to skip checking
no matter what the variable "check_todo" says unless is_rebase_i()
qualifies as the band-aid that is the least risky.

> Yes, we should definitely consider updating the stat data after the
> sequencer writes the todo list, or any other options with which the
> sequencer could notice a modified todo list file with less subtlety.
> Alas, there is a big can of worms in that direction, see the patch
> below, and we have to be very careful going that way, so I think it's
> only viable in the long term, but less suitable as a regression fix
> for maint.

Yes, I agree that it is much less suitable than even reverting the
offending one outright.

> (Hrm, perhaps I spent too many words on the all zeroed out stat data,
> and managed to sidetrack you a bit...)

No, I do not think so.  Thinking about what we need to do in the
longer term, while coming up with a shorter term fix, is a necessary
step of gaining confidence in the latter.

Again, thanks both for thinking about this issue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
  2019-11-25  1:28             ` Junio C Hamano
@ 2019-11-25  3:10             ` SZEDER Gábor
  2019-11-25 13:18             ` SZEDER Gábor
  2 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-25  3:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Brian Norris, Johannes Schindelin

On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
> To notice a changed todo file the sequencer stores the file's stat
> data in its 'struct todo_list' instance, and compares it with the
> file's current stat data after 'reword', 'squash' and 'exec'
> instructions.  If the two stat data doesn't match, it re-reads the
> todo file.
> 
> Sounds simple, but there are some subtleties going on here:
> 
>   - The 'struct todo_list' holds the stat data from the time when the
>     todo file was last read.
> 
>   - This stat data in 'struct todo_list' is not updated when the
>     sequencer itself writes the todo file.
> 
>   - Before executing each instruction during an interactive rebase,
>     the sequencer always updates the todo file by removing the
>     just-about-to-be-executed instruction.  This changes the file's
>     size and inode [1].
> 
> Consequently, when the sequencer looks at the stat data after a
> 'reword', 'squash' or 'exec' instruction, it most likely finds that
> they differ, even when the user didn't modify the todo list at all!
> This is not an issue in practice, it just wastes a few cycles on
> re-reading the todo list that matches what the sequencer already has
> in memory anyway.
> 
> However, an unsuspecting Git developer might try to "fix" it by simply
> updating the stat data each time the sequencer writes the todo list
> for an interactive rebase.  On first sight it looks quite sensible and
> straightforward, but we have to be very careful when doing that,
> because potential racy problems lie that way.
> 
> It is possible to overwrite the todo list file without modifying
> either its inode or size, and if such an overwrite were to happen in
> the same second when the file was last read (our stat data has one
> second granularity by default), then the actual stat data on the file
> system would match the stat data that the sequencer has in memory.
> Consequently, such a modification to the todo list file would go
> unnoticed.

> [1] The todo file is written using the lockfile API, i.e. first to the
>     lockfile, which is then moved to place, so the new file can't
>     possibly have the same inode as the file it replaces.  Note,
>     however, that the file system might reuse inodes, so it is
>     possible that the new todo file ends up with the same inode as is
>     recorded in the 'struct todo_list' from the last time the file was
>     read.

Unfortunately, we already do have an issue here when the sequencer can
overlook a modified todo list, but triggering it needs the lucky
"alignment" of inodes as well.  I can trigger it fairly reliably with
the test below, but forcing the inode match makes it kind of gross and
Linux-only.

  https://travis-ci.org/szeder/git/jobs/616460522#L1470


  ---  >8  ---

diff --git a/t/t9999-rebase-racy-todo-reread.sh b/t/t9999-rebase-racy-todo-reread.sh
new file mode 100755
index 0000000000..437ebd55e0
--- /dev/null
+++ b/t/t9999-rebase-racy-todo-reread.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+
+test_description='racy edit todo reread problem'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit first_ &&
+	test_commit second &&
+	test_commit third_ &&
+	test_commit fourth &&
+	test_commit fifth_ &&
+	test_commit sixth_ &&
+
+	write_script sequence-editor <<-\EOS &&
+		todo=.git/rebase-merge/git-rebase-todo &&
+		cat >"$todo" <<-EOF
+			reword $(git rev-parse second^0) second
+			reword $(git rev-parse third_^0) third_
+			reword $(git rev-parse fourth^0) fourth
+		EOF
+	EOS
+
+	write_script commit-editor <<-\EOS &&
+		read first_line <"$1" &&
+		echo "$first_line - edited" >"$1" &&
+
+		todo=.git/rebase-merge/git-rebase-todo &&
+
+		if test "$first_line" = second
+		then
+			stat --format=%i "$todo" >expected-ino
+		elif test "$first_line" = third_
+		then
+			ino=$(cat expected-ino) &&
+			file=$(find . -inum $ino) &&
+			if test -n "$file"
+			then
+				echo &&
+				echo "Trying to free inode $ino by moving \"$file\" out of the way" &&
+				cp -av "$file" "$file".tmp &&
+				rm -fv "$file"
+			fi &&
+
+			cat >"$todo".tmp <<-EOF &&
+			reword $(git rev-parse fifth_^0) fifth_
+			reword $(git rev-parse sixth_^0) sixth_
+			EOF
+			mv -v "$todo".tmp "$todo" &&
+
+			if test "$ino" -eq $(stat --format=%i "$todo")
+			then
+				echo "Yay! The todo list did get inode $ino, just what the sequencer is expecting!"
+			fi &&
+
+			if test -n "$file"
+			then
+				mv -v "$file".tmp "$file"
+			fi
+		fi
+	EOS
+
+	cat >expect <<-\EOF
+	sixth_ - edited
+	fifth_ - edited
+	third_ - edited
+	second - edited
+	first_
+	EOF
+'
+
+for trial in 0 1 2 3 4
+do
+	test_expect_success "demonstrate racy todo re-read problem #$trial" '
+		git reset --hard fourth &&
+		>expected-ino && # placeholder
+
+		GIT_SEQUENCE_EDITOR=./sequence-editor \
+		GIT_EDITOR=./commit-editor \
+		git rebase -i HEAD^^^ &&
+
+		git log --format=%s >actual &&
+		test_cmp expect actual
+	'
+done
+
+test_done

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] sequencer: don't re-read todo for revert and cherry-pick
  2019-11-25  1:10           ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick Junio C Hamano
@ 2019-11-25 10:47             ` Phillip Wood
  0 siblings, 0 replies; 16+ messages in thread
From: Phillip Wood @ 2019-11-25 10:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SZEDER Gábor, git, Brian Norris, Phillip Wood

Hi Junio

On 25/11/2019 01:10, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>>>> -		} else if (check_todo && !res) {
>>>> +		} else if (is_rebase_i(opts) && check_todo && !res) {
>>>
>>> It is a bit sad that setting of check_todo is not something a single
>>> helper function can decide, so that this is_rebase_i(opts) can be
>>> taken into account when that helper function (the logical place
>>> would be do_pick_commit()) decides to set (or not set) check_todo.
>>>
>>> Unfortunately, that is not sufficient, I suspect.  Why did a47ba3c7
>>> ("rebase -i: check for updated todo after squash and reword",
>>> 2019-08-19) decide to flip check_todo on when running TODO_EXEC?
>>
>> I'm not sure what you mean by this
>>
>> This is what I had before I saw Gábor's patch (the tests are pretty
>> similar but I think we should check that the messages of all the picks
>> are actually edited with --edit - that does not seem to be checked by
>> the current tests)...
> 
> I first thought that unsetting *check_todo in do_pick_commit(), when
> !is_rebase_i(), was a clean solution.  But sadly it is *not* a godo
> equivalent to Gábor's patch, because check_todo can be set to true
> without looking at is_rebase_i() in pick_commits() [*1*]. 

That only happens if we're running a rebase as exec commands are not 
used by cherry-pick and revert so the unconditional setting that remains 
after my suggested fix should be safe. Prior to a47ba3c7 ("rebase -i: 
check for updated todo after squash and reword", 2019-08-19) we always 
checked for an updated todo list after processing an exec command 
without checking is_rebase_i()

Best Wishes

Phillip

> To ignore
> that setting where the variable's value is used, the hunk we see
> above in the beginning of this message is necessary.
> 
> That was what I meant.  I think the "This is what I had before"
> patch matches my "I first thought" version, so we were on the same
> page and both wrong ;-)
> 
> Thanks.
> 
> 
> [Footnote]
> 
> *1* I still do not know why a47ba3c7 ("rebase -i: check for updated
> todo after squash and reword", 2019-08-19) sets check_todo to true
> without looking at is_rebase_i().  If this unconditonal setting in
> TODO_EXEC did not exist, I think your "This is what I had before"
> patch would have been equivalent to Gábor's patch.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
  2019-11-25  1:28             ` Junio C Hamano
  2019-11-25  3:10             ` SZEDER Gábor
@ 2019-11-25 13:18             ` SZEDER Gábor
  2019-11-25 14:43               ` Phillip Wood
  2 siblings, 1 reply; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-25 13:18 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, git, Brian Norris, Johannes Schindelin

On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
> To notice a changed todo file the sequencer stores the file's stat
> data in its 'struct todo_list' instance, and compares it with the
> file's current stat data after 'reword', 'squash' and 'exec'
> instructions.  If the two stat data doesn't match, it re-reads the
> todo file.
> 
> Sounds simple, but there are some subtleties going on here:
> 
>   - The 'struct todo_list' holds the stat data from the time when the
>     todo file was last read.
> 
>   - This stat data in 'struct todo_list' is not updated when the
>     sequencer itself writes the todo file.
> 
>   - Before executing each instruction during an interactive rebase,
>     the sequencer always updates the todo file by removing the
>     just-about-to-be-executed instruction.  This changes the file's
>     size and inode [1].
> 
> Consequently, when the sequencer looks at the stat data after a
> 'reword', 'squash' or 'exec' instruction, it most likely finds that
> they differ, even when the user didn't modify the todo list at all!
> This is not an issue in practice, it just wastes a few cycles on
> re-reading the todo list that matches what the sequencer already has
> in memory anyway.

It can be much more than just a few cycles, because the total number
of parsed instructions from all the todo file re-reads can go
quadratic with the number of rebased commits.

The simple test below runs 'git rebase -i -x' on 1000 commits, which
takes over 14seconds to run.  If it doesn't re-read the todo file at
all (I simply deleted the whole condition block checking the stat data
and re-reading) it runs for only ~2.5secs.

Just another angle to consider...


  ---  >8  ---

test_expect_success 'test' '
	num_commits=1000 &&
	test_commit_bulk --filename=file $num_commits &&

	/usr/bin/time -f "Elapsed time: %e" \
		git rebase -i --root -x true 2>out &&

	grep "Executing: true" out >actual &&
	test_line_count = $num_commits actual &&

	# show the elapsed time
	tail -n2 out
'

  ---  >8  ---


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-25 13:18             ` SZEDER Gábor
@ 2019-11-25 14:43               ` Phillip Wood
  2019-11-25 15:15                 ` SZEDER Gábor
  0 siblings, 1 reply; 16+ messages in thread
From: Phillip Wood @ 2019-11-25 14:43 UTC (permalink / raw)
  To: SZEDER Gábor, phillip.wood
  Cc: Junio C Hamano, git, Brian Norris, Johannes Schindelin

On 25/11/2019 13:18, SZEDER Gábor wrote:
> On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
>> To notice a changed todo file the sequencer stores the file's stat
>> data in its 'struct todo_list' instance, and compares it with the
>> file's current stat data after 'reword', 'squash' and 'exec'
>> instructions.  If the two stat data doesn't match, it re-reads the
>> todo file.
>>
>> Sounds simple, but there are some subtleties going on here:
>>
>>    - The 'struct todo_list' holds the stat data from the time when the
>>      todo file was last read.
>>
>>    - This stat data in 'struct todo_list' is not updated when the
>>      sequencer itself writes the todo file.
>>
>>    - Before executing each instruction during an interactive rebase,
>>      the sequencer always updates the todo file by removing the
>>      just-about-to-be-executed instruction.  This changes the file's
>>      size and inode [1].
>>
>> Consequently, when the sequencer looks at the stat data after a
>> 'reword', 'squash' or 'exec' instruction, it most likely finds that
>> they differ, even when the user didn't modify the todo list at all!
>> This is not an issue in practice, it just wastes a few cycles on
>> re-reading the todo list that matches what the sequencer already has
>> in memory anyway.
> 
> It can be much more than just a few cycles, because the total number
> of parsed instructions from all the todo file re-reads can go
> quadratic with the number of rebased commits.
> 
> The simple test below runs 'git rebase -i -x' on 1000 commits, which
> takes over 14seconds to run.  If it doesn't re-read the todo file at
> all (I simply deleted the whole condition block checking the stat data
> and re-reading) it runs for only ~2.5secs.
> 
> Just another angle to consider...

I know dscho was keen to avoid re-parsing the list all the time [1] 
presumably because of the quadratic behavior. (He also assumed most 
people were using ns stat times [2] but that appears not to be the case) 
Could we just compare the text of the todo list on disk to whats in 
todo->buf.buf (with an appropriate offset)? That would avoid parsing the 
text and looking up all the commits with get_oid()

Best Wishes

Phillip

[1] 
https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/
[2] 
https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/

> 
>    ---  >8  ---
> 
> test_expect_success 'test' '
> 	num_commits=1000 &&
> 	test_commit_bulk --filename=file $num_commits &&
> 
> 	/usr/bin/time -f "Elapsed time: %e" \
> 		git rebase -i --root -x true 2>out &&
> 
> 	grep "Executing: true" out >actual &&
> 	test_line_count = $num_commits actual &&
> 
> 	# show the elapsed time
> 	tail -n2 out
> '
> 
>    ---  >8  ---
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-25 14:43               ` Phillip Wood
@ 2019-11-25 15:15                 ` SZEDER Gábor
  2019-11-25 16:40                   ` Phillip Wood
  0 siblings, 1 reply; 16+ messages in thread
From: SZEDER Gábor @ 2019-11-25 15:15 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, git, Brian Norris, Johannes Schindelin

On Mon, Nov 25, 2019 at 02:43:07PM +0000, Phillip Wood wrote:
> On 25/11/2019 13:18, SZEDER Gábor wrote:
> >On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
> >>To notice a changed todo file the sequencer stores the file's stat
> >>data in its 'struct todo_list' instance, and compares it with the
> >>file's current stat data after 'reword', 'squash' and 'exec'
> >>instructions.  If the two stat data doesn't match, it re-reads the
> >>todo file.
> >>
> >>Sounds simple, but there are some subtleties going on here:
> >>
> >>   - The 'struct todo_list' holds the stat data from the time when the
> >>     todo file was last read.
> >>
> >>   - This stat data in 'struct todo_list' is not updated when the
> >>     sequencer itself writes the todo file.
> >>
> >>   - Before executing each instruction during an interactive rebase,
> >>     the sequencer always updates the todo file by removing the
> >>     just-about-to-be-executed instruction.  This changes the file's
> >>     size and inode [1].
> >>
> >>Consequently, when the sequencer looks at the stat data after a
> >>'reword', 'squash' or 'exec' instruction, it most likely finds that
> >>they differ, even when the user didn't modify the todo list at all!
> >>This is not an issue in practice, it just wastes a few cycles on
> >>re-reading the todo list that matches what the sequencer already has
> >>in memory anyway.
> >
> >It can be much more than just a few cycles, because the total number
> >of parsed instructions from all the todo file re-reads can go
> >quadratic with the number of rebased commits.
> >
> >The simple test below runs 'git rebase -i -x' on 1000 commits, which
> >takes over 14seconds to run.  If it doesn't re-read the todo file at
> >all (I simply deleted the whole condition block checking the stat data
> >and re-reading) it runs for only ~2.5secs.
> >
> >Just another angle to consider...
> 
> I know dscho was keen to avoid re-parsing the list all the time [1]
> presumably because of the quadratic behavior. (He also assumed most people
> were using ns stat times [2] but that appears not to be the case)

Nanosecond file timestamp comparisons are only enabled by the USE_NSEC
macro, which is only defined if the USE_NSEC Makefile knob is enabled,
but that is not enabled by default.

Then there is the related NO_NSEC Makefile knob:

  # Define NO_NSEC if your "struct stat" does not have "st_ctim.tv_nsec"
  # available.  This automatically turns USE_NSEC off.

As Dscho mentioned in [2], we do disable nanosecond file timestamp
comparisons in 'config.mak.uname' on a handful of platforms by setting
NO_NSEC.  This, however, does not mean that nanosec timestamps are
enabled on other platforms by default.

> Could we
> just compare the text of the todo list on disk to whats in todo->buf.buf
> (with an appropriate offset)? That would avoid parsing the text and looking
> up all the commits with get_oid()

Comparing the contents without parsing is still quadratic in the size
of the todo list, though I suppose with a much lower constant factor
than actually parsing it.

> [1]
> https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/
> [2]
> https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/
> 
> >
> >   ---  >8  ---
> >
> >test_expect_success 'test' '
> >	num_commits=1000 &&
> >	test_commit_bulk --filename=file $num_commits &&
> >
> >	/usr/bin/time -f "Elapsed time: %e" \
> >		git rebase -i --root -x true 2>out &&
> >
> >	grep "Executing: true" out >actual &&
> >	test_line_count = $num_commits actual &&
> >
> >	# show the elapsed time
> >	tail -n2 out
> >'
> >
> >   ---  >8  ---
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] t3429: try to protect against a potential racy todo file problem
  2019-11-25 15:15                 ` SZEDER Gábor
@ 2019-11-25 16:40                   ` Phillip Wood
  0 siblings, 0 replies; 16+ messages in thread
From: Phillip Wood @ 2019-11-25 16:40 UTC (permalink / raw)
  To: SZEDER Gábor, phillip.wood
  Cc: Junio C Hamano, git, Brian Norris, Johannes Schindelin

On 25/11/2019 15:15, SZEDER Gábor wrote:
> On Mon, Nov 25, 2019 at 02:43:07PM +0000, Phillip Wood wrote:
>> On 25/11/2019 13:18, SZEDER Gábor wrote:
>>> On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
>>>> To notice a changed todo file the sequencer stores the file's stat
>>>> data in its 'struct todo_list' instance, and compares it with the
>>>> file's current stat data after 'reword', 'squash' and 'exec'
>>>> instructions.  If the two stat data doesn't match, it re-reads the
>>>> todo file.
>>>>
>>>> Sounds simple, but there are some subtleties going on here:
>>>>
>>>>    - The 'struct todo_list' holds the stat data from the time when the
>>>>      todo file was last read.
>>>>
>>>>    - This stat data in 'struct todo_list' is not updated when the
>>>>      sequencer itself writes the todo file.
>>>>
>>>>    - Before executing each instruction during an interactive rebase,
>>>>      the sequencer always updates the todo file by removing the
>>>>      just-about-to-be-executed instruction.  This changes the file's
>>>>      size and inode [1].
>>>>
>>>> Consequently, when the sequencer looks at the stat data after a
>>>> 'reword', 'squash' or 'exec' instruction, it most likely finds that
>>>> they differ, even when the user didn't modify the todo list at all!
>>>> This is not an issue in practice, it just wastes a few cycles on
>>>> re-reading the todo list that matches what the sequencer already has
>>>> in memory anyway.
>>>
>>> It can be much more than just a few cycles, because the total number
>>> of parsed instructions from all the todo file re-reads can go
>>> quadratic with the number of rebased commits.
>>>
>>> The simple test below runs 'git rebase -i -x' on 1000 commits, which
>>> takes over 14seconds to run.  If it doesn't re-read the todo file at
>>> all (I simply deleted the whole condition block checking the stat data
>>> and re-reading) it runs for only ~2.5secs.
>>>
>>> Just another angle to consider...
>>
>> I know dscho was keen to avoid re-parsing the list all the time [1]
>> presumably because of the quadratic behavior. (He also assumed most people
>> were using ns stat times [2] but that appears not to be the case)
> 
> Nanosecond file timestamp comparisons are only enabled by the USE_NSEC
> macro, which is only defined if the USE_NSEC Makefile knob is enabled,
> but that is not enabled by default.
> 
> Then there is the related NO_NSEC Makefile knob:
> 
>    # Define NO_NSEC if your "struct stat" does not have "st_ctim.tv_nsec"
>    # available.  This automatically turns USE_NSEC off.
> 
> As Dscho mentioned in [2], we do disable nanosecond file timestamp
> comparisons in 'config.mak.uname' on a handful of platforms by setting
> NO_NSEC.  This, however, does not mean that nanosec timestamps are
> enabled on other platforms by default.
> 
>> Could we
>> just compare the text of the todo list on disk to whats in todo->buf.buf
>> (with an appropriate offset)? That would avoid parsing the text and looking
>> up all the commits with get_oid()
> 
> Comparing the contents without parsing is still quadratic in the size
> of the todo list, though I suppose with a much lower constant factor
> than actually parsing it.

The patch below (assuming thunderbird doesn't mangle it) reduces the 
time to run your bulk commit test from 30s to 7s, if I delete the 
condition block which checks the stat data it takes 4.7s on my (somewhat 
ancient) laptop. So there is a cost to the string comparison approach 
but it's much less that the full todo list parsing.

Best Wishes

Phillip

--- >8 ---
diff --git a/sequencer.c b/sequencer.c
index 8952cfa89b..a3efdae0a5 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3909,12 +3909,17 @@ static int pick_commits(struct repository *r,
                                                         arg, item->arg_len,
                                                         opts, res, 0);
                 } else if (check_todo && !res) {
-                       struct stat st;
-
-                       if (stat(get_todo_path(opts), &st)) {
-                               res = error_errno(_("could not stat '%s'"),
+                       int offset;
+                       struct strbuf buf = STRBUF_INIT;
+                       if (strbuf_read_file(&buf, get_todo_path(opts), 
8096) < 0)
+                               res = error_errno(_("could not read '%s'"),
                                                   get_todo_path(opts));
-                       } else if (match_stat_data(&todo_list->stat, &st)) {
+
+                       offset = get_item_line_offset(todo_list,
+                                                     todo_list->current 
+ 1);
+                       if (buf.len != todo_list->buf.len  - offset ||
+                           memcmp(buf.buf, todo_list->buf.buf + offset, 
buf.len)) {
+                               fputs("re-reading todo list\n", stderr);
                                 /* Reread the todo file if it has 
changed. */
                                 todo_list_release(todo_list);
                                 if (read_populate_todo(r, todo_list, opts))


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-11-25 16:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 23:10 git 2.24: git revert <commit1> <commit2> requires extra '--continue'? Brian Norris
2019-11-23  0:34 ` SZEDER Gábor
2019-11-23  9:53   ` Phillip Wood
2019-11-23 17:20     ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick SZEDER Gábor
2019-11-23 21:14       ` Johannes Schindelin
2019-11-24  4:49       ` Junio C Hamano
2019-11-24 10:44         ` Phillip Wood
2019-11-24 21:10           ` [PATCH] t3429: try to protect against a potential racy todo file problem SZEDER Gábor
2019-11-25  1:28             ` Junio C Hamano
2019-11-25  3:10             ` SZEDER Gábor
2019-11-25 13:18             ` SZEDER Gábor
2019-11-25 14:43               ` Phillip Wood
2019-11-25 15:15                 ` SZEDER Gábor
2019-11-25 16:40                   ` Phillip Wood
2019-11-25  1:10           ` [PATCH] sequencer: don't re-read todo for revert and cherry-pick Junio C Hamano
2019-11-25 10:47             ` Phillip Wood

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).