* Comparing rebase --am with --interactive via p3400 @ 2019-02-01 6:04 Johannes Schindelin 2019-02-01 7:22 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Johannes Schindelin @ 2019-02-01 6:04 UTC (permalink / raw) To: Elijah Newren; +Cc: git Hi Elijah, as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the --am backend) and then with --keep-empty to force the interactive backend to be used. Here are the best of 10, on my relatively powerful Windows 10 laptop, with current `master`. With regular rebase --am: 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) with --keep-empty to force the interactive backend: 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) I then changed it to -m to test the current scripted version, trying to let it run overnight, but my laptop eventually went to sleep and the tests were not even done. I'll let them continue and report back. My conclusion after seeing these numbers is: the interactive rebase is really close to the performance of the --am backend. So to me, it makes a total lot of sense to switch --merge over to it, and to make --merge the default. We still should investigate why the split-index performance is so significantly worse, though. Ciao, Dscho ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-02-01 6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin @ 2019-02-01 7:22 ` Johannes Schindelin 2019-02-01 9:26 ` Elijah Newren 2019-12-27 21:11 ` Alban Gruin 2 siblings, 0 replies; 11+ messages in thread From: Johannes Schindelin @ 2019-02-01 7:22 UTC (permalink / raw) To: Elijah Newren; +Cc: git Hi Elijah, On Fri, 1 Feb 2019, Johannes Schindelin wrote: > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > --am backend) and then with --keep-empty to force the interactive backend > to be used. Here are the best of 10, on my relatively powerful Windows 10 > laptop, with current `master`. > > With regular rebase --am: > > 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > > with --keep-empty to force the interactive backend: > > 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) > > I then changed it to -m to test the current scripted version, trying to > let it run overnight, but my laptop eventually went to sleep and the tests > were not even done. I'll let them continue and report back. It finally finished: 3400.2: rebase on top of a lot of unrelated changes 7.37(0.09+0.19) 3400.4: rebase a lot of unrelated changes without split-index 393.96(0.04+0.15) 3400.6: rebase a lot of unrelated changes with split-index 404.65(0.01+0.24) So there is a seemingly significant cost to using the split-index that is just very unfortunate. In any case, just switching from the scripted --merge backend to the built-in interactive backend results in a >10x faster execution. So I *definitely* want that scripted `--merge` backend to go away. Thank you for doing this. Ciao, Dscho ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-02-01 6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin 2019-02-01 7:22 ` Johannes Schindelin @ 2019-02-01 9:26 ` Elijah Newren 2019-12-27 21:11 ` Alban Gruin 2 siblings, 0 replies; 11+ messages in thread From: Elijah Newren @ 2019-02-01 9:26 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Git Mailing List Hi Dscho, On Thu, Jan 31, 2019 at 10:04 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > --am backend) and then with --keep-empty to force the interactive backend > to be used. Here are the best of 10, on my relatively powerful Windows 10 > laptop, with current `master`. > > With regular rebase --am: > > 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > > with --keep-empty to force the interactive backend: > > 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) Awesome, thanks for checking that out. I ran on both linux and mac and saw similar relative performances. Comparing am-based rebase to an implied-interactive rebase on both linux and mac (with a version of git including en/rebase-merge-on-sequencer so that -m gives the same performance that you'd see with --keep-empty), I saw: On Linux: am-based rebase (without -m): 3400.2: rebase on top of a lot of unrelated changes 1.87(1.64+0.21) 3400.4: rebase a lot of unrelated changes without split-index 7.87(6.24+1.00) 3400.6: rebase a lot of unrelated changes with split-index 5.99(5.05+0.67) interactive-machinery rebase (with -m): 3400.2: rebase on top of a lot of unrelated changes 1.80(1.60+0.19) 3400.4: rebase a lot of unrelated changes without split-index 6.78(5.70+0.91) 3400.6: rebase a lot of unrelated changes with split-index 6.92(5.70+0.89) On Mac: am-based rebase (without -m): Test this tree ------------------------------------------------------------------------------- 3400.2: rebase on top of a lot of unrelated changes 2.68(1.68+0.68) 3400.4: rebase a lot of unrelated changes without split-index 8.89(5.86+2.94) 3400.6: rebase a lot of unrelated changes with split-index 7.87(5.35+2.51) interactive-machinery rebase (with -m): Test this tree ------------------------------------------------------------------------------- 3400.2: rebase on top of a lot of unrelated changes 1.99(1.61+0.77) 3400.4: rebase a lot of unrelated changes without split-index 8.63(5.38+3.38) 3400.6: rebase a lot of unrelated changes with split-index 9.36(5.53+3.95) > I then changed it to -m to test the current scripted version, trying to > let it run overnight, but my laptop eventually went to sleep and the tests > were not even done. I'll let them continue and report back. > > My conclusion after seeing these numbers is: the interactive rebase is > really close to the performance of the --am backend. So to me, it makes a > total lot of sense to switch --merge over to it, and to make --merge the > default. We still should investigate why the split-index performance is so > significantly worse, though. Cool, I'll update my patches to make --merge the default (building on top of en/rebase-merge-on-sequencer) and post it as an RFC. But yeah, we should also check into why the split-index performance becomes a bit worse with such a change. Thanks, Elijah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-02-01 6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin 2019-02-01 7:22 ` Johannes Schindelin 2019-02-01 9:26 ` Elijah Newren @ 2019-12-27 21:11 ` Alban Gruin 2019-12-27 22:45 ` Elijah Newren 2020-01-31 21:23 ` Johannes Schindelin 2 siblings, 2 replies; 11+ messages in thread From: Alban Gruin @ 2019-12-27 21:11 UTC (permalink / raw) To: Johannes Schindelin, Elijah Newren; +Cc: git [-- Attachment #1: Type: text/plain, Size: 5067 bytes --] Hi Johannes & Elijah, Le 01/02/2019 à 07:04, Johannes Schindelin a écrit : > Hi Elijah, > > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > --am backend) and then with --keep-empty to force the interactive backend > to be used. Here are the best of 10, on my relatively powerful Windows 10 > laptop, with current `master`. > > With regular rebase --am: > > 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > > with --keep-empty to force the interactive backend: > > 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) > > I then changed it to -m to test the current scripted version, trying to > let it run overnight, but my laptop eventually went to sleep and the tests > were not even done. I'll let them continue and report back. > > My conclusion after seeing these numbers is: the interactive rebase is > really close to the performance of the --am backend. So to me, it makes a > total lot of sense to switch --merge over to it, and to make --merge the > default. We still should investigate why the split-index performance is so > significantly worse, though. > > Ciao, > Dscho > I investigated a bit on this. From a quick glance at a callgrind trace, I can see that ce_write_entry() is called 20 601[1] times with `git am', but 739 802 times with the sequencer when the split-index is enabled. For reference, here are the timings, measured on my Linux machine, on a tmpfs, with git.git as the repo: `rebase --am': > 3400.2: rebase on top of a lot of unrelated changes 0.29(0.24+0.03) > 3400.4: rebase a lot of unrelated changes without split-index 6.77(6.51+0.22) > 3400.6: rebase a lot of unrelated changes with split-index 4.43(4.29+0.13) `rebase --quiet': > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.02) > 3400.4: rebase a lot of unrelated changes without split-index 5.60(5.32+0.27) > 3400.6: rebase a lot of unrelated changes with split-index 5.67(5.40+0.26) This comes from two things: 1. There is not enough shared entries in the index with the sequencer. do_write_index() is called only by do_write_locked_index() with `--am', but is also called by write_shared_index() with the sequencer once for every other commit. As the latter is only called by write_locked_index(), which means that too_many_not_shared_entries() returns true for the sequencer, but never for `--am'. Removing the call to discard_index() in do_pick_commit() (as in the first attached patch) solve this particular issue, but this would require a more thorough analysis to see if it is actually safe to do. After this, ce_write() is still called much more by the sequencer. Here are the results of `rebase --quiet' without discarding the index: > 3400.2: rebase on top of a lot of unrelated changes 0.23(0.19+0.04) > 3400.4: rebase a lot of unrelated changes without split-index 5.14(4.95+0.18) > 3400.6: rebase a lot of unrelated changes with split-index 5.02(4.87+0.15) The performance of the rebase is better in the two cases. 2. The base index is dropped by unpack_trees_start() and unpack_trees(). Now, write_shared_index() is no longer called and write_locked_index() is less expensive than before according to callgrind. But ce_write_entry() is still called 749 302 times (which is even more than before.) The only place where ce_write_entry() is called is in a loop in do_write_index(). The number of iterations is dictated by the size of the cache, and there is a trace2 probe dumping this value. For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, … up until 101. For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698, 3699, 3699, … up until 3796. The size of the cache is set in prepare_to_write_split_index(). It grows if a cache entry has no index (most of them should have one by now), or if the split index has no base index (with `--am', the split index always has a base.) This comes from unpack_trees_start() -- it creates a new index, and unpack_trees() does not carry the base index, hence the size of the cache. The second attached patch (which is broken for the non-interactive rebase case) demonstrates what we could expect for the split-index case if we fix this: > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.03) > 3400.4: rebase a lot of unrelated changes without split-index 5.81(5.62+0.17) > 3400.6: rebase a lot of unrelated changes with split-index 4.76(4.54+0.20) So, for everything related to the index, I think that’s it. [1] Numbers may vary, but they should remain in the same order of magnitude. Cheers, Alban [-- Attachment #2: sequencer-rebase-si.patch --] [-- Type: text/x-patch, Size: 317 bytes --] diff --git a/sequencer.c b/sequencer.c index 1bee26ebd5..2831abd0fa 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1863,7 +1863,6 @@ static int do_pick_commit(struct repository *r, NULL, 0)) return error_dirty_index(r, opts); } - discard_index(r->index); if (!commit->parents) parent = NULL; [-- Attachment #3: merge-recursive-rebase-si.patch --] [-- Type: text/x-patch, Size: 1367 bytes --] diff --git a/merge-recursive.c b/merge-recursive.c index 11869ad81c..47f67079f3 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -421,7 +421,7 @@ static int unpack_trees_start(struct merge_options *opt, { int rc; struct tree_desc t[3]; - struct index_state tmp_index = { NULL }; + /* struct index_state tmp_index = { NULL }; */ memset(&opt->priv->unpack_opts, 0, sizeof(opt->priv->unpack_opts)); if (opt->priv->call_depth) @@ -432,7 +432,7 @@ static int unpack_trees_start(struct merge_options *opt, opt->priv->unpack_opts.head_idx = 2; opt->priv->unpack_opts.fn = threeway_merge; opt->priv->unpack_opts.src_index = opt->repo->index; - opt->priv->unpack_opts.dst_index = &tmp_index; + opt->priv->unpack_opts.dst_index = opt->repo->index; opt->priv->unpack_opts.aggressive = !merge_detect_rename(opt); setup_unpack_trees_porcelain(&opt->priv->unpack_opts, "merge"); @@ -449,8 +449,8 @@ static int unpack_trees_start(struct merge_options *opt, * saved copy. (verify_uptodate() checks src_index, and the original * index is the one that had the necessary modification timestamps.) */ - opt->priv->orig_index = *opt->repo->index; - *opt->repo->index = tmp_index; + /* opt->priv->orig_index = *opt->repo->index; */ + /* *opt->repo->index = tmp_index; */ opt->priv->unpack_opts.src_index = &opt->priv->orig_index; return rc; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-12-27 21:11 ` Alban Gruin @ 2019-12-27 22:45 ` Elijah Newren 2019-12-29 17:25 ` Alban Gruin 2020-01-31 21:23 ` Johannes Schindelin 1 sibling, 1 reply; 11+ messages in thread From: Elijah Newren @ 2019-12-27 22:45 UTC (permalink / raw) To: Alban Gruin; +Cc: Johannes Schindelin, Git Mailing List Hi Alban, On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > Hi Johannes & Elijah, > > Le 01/02/2019 à 07:04, Johannes Schindelin a écrit : > > Hi Elijah, > > > > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > > --am backend) and then with --keep-empty to force the interactive backend > > to be used. Here are the best of 10, on my relatively powerful Windows 10 > > laptop, with current `master`. > > > > With regular rebase --am: > > > > 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > > 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > > 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > > > > with --keep-empty to force the interactive backend: > > > > 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > > 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > > 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) > > > > I then changed it to -m to test the current scripted version, trying to > > let it run overnight, but my laptop eventually went to sleep and the tests > > were not even done. I'll let them continue and report back. > > > > My conclusion after seeing these numbers is: the interactive rebase is > > really close to the performance of the --am backend. So to me, it makes a > > total lot of sense to switch --merge over to it, and to make --merge the > > default. We still should investigate why the split-index performance is so > > significantly worse, though. > > > > Ciao, > > Dscho > > > > I investigated a bit on this. From a quick glance at a callgrind trace, > I can see that ce_write_entry() is called 20 601[1] times with `git am', > but 739 802 times with the sequencer when the split-index is enabled. Sweet, thanks for digging in and analyzing this. > For reference, here are the timings, measured on my Linux machine, on a > tmpfs, with git.git as the repo: > > `rebase --am': > > 3400.2: rebase on top of a lot of unrelated changes 0.29(0.24+0.03) > > 3400.4: rebase a lot of unrelated changes without split-index 6.77(6.51+0.22) > > 3400.6: rebase a lot of unrelated changes with split-index 4.43(4.29+0.13) > `rebase --quiet': --quiet? Isn't that flag supposed to work with both backends and not imply either one? We previously used --keep-empty, though there's a chance that flag means we're not doing a fair comparison (since 'am' will drop empty commits and thus have less work to do). Is there any chance you actually ran a different command, but when you went to summarize just typed the wrong flag name? Anyway, the best would probably be to use --merge here (at the time Johannes and I were testing, that wouldn't have triggered the sequencer, but it does now), after first applying the en/rebase-backend series just to make sure we're doing an apples to apples comparison. However, I suspect that empty commits probably weren't much of a factor and you did find some interesting things... > > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.02) > > 3400.4: rebase a lot of unrelated changes without split-index 5.60(5.32+0.27) > > 3400.6: rebase a lot of unrelated changes with split-index 5.67(5.40+0.26) > > This comes from two things: > > 1. There is not enough shared entries in the index with the sequencer. > > do_write_index() is called only by do_write_locked_index() with `--am', > but is also called by write_shared_index() with the sequencer once for > every other commit. As the latter is only called by > write_locked_index(), which means that too_many_not_shared_entries() > returns true for the sequencer, but never for `--am'. > > Removing the call to discard_index() in do_pick_commit() (as in the > first attached patch) solve this particular issue, but this would > require a more thorough analysis to see if it is actually safe to do. I'm actually surprised the sequencer would call discard_index(); I would have thought it would have relied on merge_recursive() to do the necessary index changes and updates other than writing the new index out. But I'm not quite as familar with the sequencer so perhaps there's some reason I'm unaware of. (Any chance this is a left-over from when sequencer invoked external scripts to do the work, and thus the index was updated in another processes' memory and on disk, and it had to discard and re-read to get its own process updated?) > After this, ce_write() is still called much more by the sequencer. > > Here are the results of `rebase --quiet' without discarding the index: > > > 3400.2: rebase on top of a lot of unrelated changes 0.23(0.19+0.04) > > 3400.4: rebase a lot of unrelated changes without split-index 5.14(4.95+0.18) > > 3400.6: rebase a lot of unrelated changes with split-index 5.02(4.87+0.15) > The performance of the rebase is better in the two cases. Nice. :-) > 2. The base index is dropped by unpack_trees_start() and unpack_trees(). > > Now, write_shared_index() is no longer called and write_locked_index() > is less expensive than before according to callgrind. But > ce_write_entry() is still called 749 302 times (which is even more than > before.) > > The only place where ce_write_entry() is called is in a loop in > do_write_index(). The number of iterations is dictated by the size of > the cache, and there is a trace2 probe dumping this value. > > For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, > 4, 4, 5, 5, 5, 5, … up until 101. > > For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698, > 3699, 3699, … up until 3796. > > The size of the cache is set in prepare_to_write_split_index(). It > grows if a cache entry has no index (most of them should have one by > now), or if the split index has no base index (with `--am', the split > index always has a base.) This comes from unpack_trees_start() -- it > creates a new index, and unpack_trees() does not carry the base index, > hence the size of the cache. > > The second attached patch (which is broken for the non-interactive > rebase case) demonstrates what we could expect for the split-index case > if we fix this: > > > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.03) > > 3400.4: rebase a lot of unrelated changes without split-index 5.81(5.62+0.17) > > 3400.6: rebase a lot of unrelated changes with split-index 4.76(4.54+0.20) > So, for everything related to the index, I think that’s it. > > [1] Numbers may vary, but they should remain in the same order of magnitude. Unfortunately, this patch as-is breaks some important things even if it only shows up in a few testcases. merge-recursive needs to know both what the index looked like before the merge started, as well as what it looks like after unpack-trees runs; see commits 1de70dbd1a (merge-recursive: fix check for skipability of working tree updates, 2018-04-19) and a35edc84bd (merge-recursive: fix was_tracked() to quit lying with some renamed paths, 2018-04-19), and maybe a few others from that series. But, noting that it comes from the differences in the index as unpack_trees runs is useful info. I might be restructuring this code somewhat significantly but it helps to have this in mind; I may spot opportunities to do something with it while I'm digging in... Elijah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-12-27 22:45 ` Elijah Newren @ 2019-12-29 17:25 ` Alban Gruin 2020-01-02 20:17 ` Johannes Schindelin 0 siblings, 1 reply; 11+ messages in thread From: Alban Gruin @ 2019-12-29 17:25 UTC (permalink / raw) To: Elijah Newren; +Cc: Johannes Schindelin, Git Mailing List Hi Elijah, Le 27/12/2019 à 23:45, Elijah Newren a écrit : > Hi Alban, > > On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: >> >> Hi Johannes & Elijah, >> >> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit : >>> Hi Elijah, >>> >>> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the >>> --am backend) and then with --keep-empty to force the interactive backend >>> to be used. Here are the best of 10, on my relatively powerful Windows 10 >>> laptop, with current `master`. >>> >>> With regular rebase --am: >>> >>> 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) >>> 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) >>> 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) >>> >>> with --keep-empty to force the interactive backend: >>> >>> 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) >>> 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) >>> 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) >>> >>> I then changed it to -m to test the current scripted version, trying to >>> let it run overnight, but my laptop eventually went to sleep and the tests >>> were not even done. I'll let them continue and report back. >>> >>> My conclusion after seeing these numbers is: the interactive rebase is >>> really close to the performance of the --am backend. So to me, it makes a >>> total lot of sense to switch --merge over to it, and to make --merge the >>> default. We still should investigate why the split-index performance is so >>> significantly worse, though. >>> >>> Ciao, >>> Dscho >>> >> >> I investigated a bit on this. From a quick glance at a callgrind trace, >> I can see that ce_write_entry() is called 20 601[1] times with `git am', >> but 739 802 times with the sequencer when the split-index is enabled. > > Sweet, thanks for digging in and analyzing this. > >> For reference, here are the timings, measured on my Linux machine, on a >> tmpfs, with git.git as the repo: >> >> `rebase --am': >>> 3400.2: rebase on top of a lot of unrelated changes 0.29(0.24+0.03) >>> 3400.4: rebase a lot of unrelated changes without split-index 6.77(6.51+0.22) >>> 3400.6: rebase a lot of unrelated changes with split-index 4.43(4.29+0.13) >> `rebase --quiet': > > --quiet? Isn't that flag supposed to work with both backends and not > imply either one? We previously used --keep-empty, though there's a > chance that flag means we're not doing a fair comparison (since 'am' > will drop empty commits and thus have less work to do). Is there any > chance you actually ran a different command, but when you went to > summarize just typed the wrong flag name? Anyway, the best would > probably be to use --merge here (at the time Johannes and I were > testing, that wouldn't have triggered the sequencer, but it does now), > after first applying the en/rebase-backend series just to make sure > we're doing an apples to apples comparison. However, I suspect that > empty commits probably weren't much of a factor and you did find some > interesting things... > Yes, I did use `--keep-empty' but misremembered it when writing this email… >>> 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.02) >>> 3400.4: rebase a lot of unrelated changes without split-index 5.60(5.32+0.27) >>> 3400.6: rebase a lot of unrelated changes with split-index 5.67(5.40+0.26) >> >> This comes from two things: >> >> 1. There is not enough shared entries in the index with the sequencer. >> >> do_write_index() is called only by do_write_locked_index() with `--am', >> but is also called by write_shared_index() with the sequencer once for >> every other commit. As the latter is only called by >> write_locked_index(), which means that too_many_not_shared_entries() >> returns true for the sequencer, but never for `--am'. >> >> Removing the call to discard_index() in do_pick_commit() (as in the >> first attached patch) solve this particular issue, but this would >> require a more thorough analysis to see if it is actually safe to do. > > I'm actually surprised the sequencer would call discard_index(); I > would have thought it would have relied on merge_recursive() to do the > necessary index changes and updates other than writing the new index > out. But I'm not quite as familar with the sequencer so perhaps > there's some reason I'm unaware of. (Any chance this is a left-over > from when sequencer invoked external scripts to do the work, and thus > the index was updated in another processes' memory and on disk, and it > had to discard and re-read to get its own process updated?) > The sequencer re-reads the index after invoking an external command (either `git checkout', `git merge' or an `exec' command from the todo list), which makes sense. But this one seems to come from 6eb1b437933 ("cherry-pick/revert: make direct internal call to merge_tree()", 2008-09-02). So, yes, quite old, and perhaps no longer justified. I know I had to add another discard_cache() in rebase--interactive.c because it broke something with the submodules, but it does not seems all that useful now that rebase.c no longer has to fork to use the sequencer. Cheers, Alban ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-12-29 17:25 ` Alban Gruin @ 2020-01-02 20:17 ` Johannes Schindelin 0 siblings, 0 replies; 11+ messages in thread From: Johannes Schindelin @ 2020-01-02 20:17 UTC (permalink / raw) To: Alban Gruin; +Cc: Elijah Newren, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 6575 bytes --] Hi Alban & Elijah, On Sun, 29 Dec 2019, Alban Gruin wrote: > Hi Elijah, > > Le 27/12/2019 à 23:45, Elijah Newren a écrit : > > Hi Alban, > > > > On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: > >> > >> Hi Johannes & Elijah, > >> > >> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit : > >>> Hi Elijah, > >>> > >>> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > >>> --am backend) and then with --keep-empty to force the interactive backend > >>> to be used. Here are the best of 10, on my relatively powerful Windows 10 > >>> laptop, with current `master`. > >>> > >>> With regular rebase --am: > >>> > >>> 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > >>> 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > >>> 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > >>> > >>> with --keep-empty to force the interactive backend: > >>> > >>> 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > >>> 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > >>> 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) > >>> > >>> I then changed it to -m to test the current scripted version, trying to > >>> let it run overnight, but my laptop eventually went to sleep and the tests > >>> were not even done. I'll let them continue and report back. > >>> > >>> My conclusion after seeing these numbers is: the interactive rebase is > >>> really close to the performance of the --am backend. So to me, it makes a > >>> total lot of sense to switch --merge over to it, and to make --merge the > >>> default. We still should investigate why the split-index performance is so > >>> significantly worse, though. > >>> > >>> Ciao, > >>> Dscho > >>> > >> > >> I investigated a bit on this. From a quick glance at a callgrind trace, > >> I can see that ce_write_entry() is called 20 601[1] times with `git am', > >> but 739 802 times with the sequencer when the split-index is enabled. > > > > Sweet, thanks for digging in and analyzing this. > > > >> For reference, here are the timings, measured on my Linux machine, on a > >> tmpfs, with git.git as the repo: > >> > >> `rebase --am': > >>> 3400.2: rebase on top of a lot of unrelated changes 0.29(0.24+0.03) > >>> 3400.4: rebase a lot of unrelated changes without split-index 6.77(6.51+0.22) > >>> 3400.6: rebase a lot of unrelated changes with split-index 4.43(4.29+0.13) > >> `rebase --quiet': > > > > --quiet? Isn't that flag supposed to work with both backends and not > > imply either one? We previously used --keep-empty, though there's a > > chance that flag means we're not doing a fair comparison (since 'am' > > will drop empty commits and thus have less work to do). Is there any > > chance you actually ran a different command, but when you went to > > summarize just typed the wrong flag name? Anyway, the best would > > probably be to use --merge here (at the time Johannes and I were > > testing, that wouldn't have triggered the sequencer, but it does now), > > after first applying the en/rebase-backend series just to make sure > > we're doing an apples to apples comparison. However, I suspect that > > empty commits probably weren't much of a factor and you did find some > > interesting things... > > > > Yes, I did use `--keep-empty' but misremembered it when writing this email… > > >>> 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.02) > >>> 3400.4: rebase a lot of unrelated changes without split-index 5.60(5.32+0.27) > >>> 3400.6: rebase a lot of unrelated changes with split-index 5.67(5.40+0.26) > >> > >> This comes from two things: > >> > >> 1. There is not enough shared entries in the index with the sequencer. > >> > >> do_write_index() is called only by do_write_locked_index() with `--am', > >> but is also called by write_shared_index() with the sequencer once for > >> every other commit. As the latter is only called by > >> write_locked_index(), which means that too_many_not_shared_entries() > >> returns true for the sequencer, but never for `--am'. > >> > >> Removing the call to discard_index() in do_pick_commit() (as in the > >> first attached patch) solve this particular issue, but this would > >> require a more thorough analysis to see if it is actually safe to do. > > > > I'm actually surprised the sequencer would call discard_index(); I > > would have thought it would have relied on merge_recursive() to do the > > necessary index changes and updates other than writing the new index > > out. But I'm not quite as familar with the sequencer so perhaps > > there's some reason I'm unaware of. (Any chance this is a left-over > > from when sequencer invoked external scripts to do the work, and thus > > the index was updated in another processes' memory and on disk, and it > > had to discard and re-read to get its own process updated?) > > > > The sequencer re-reads the index after invoking an external command > (either `git checkout', `git merge' or an `exec' command from the todo > list), which makes sense. But this one seems to come from 6eb1b437933 > ("cherry-pick/revert: make direct internal call to merge_tree()", > 2008-09-02). So, yes, quite old, and perhaps no longer justified. Right. This commit also moved the `discard_cache()` call outside from the `else` clause of the `if (no_commit)`. That `else` clause goes all the way back to 9509af686bf (Make git-revert & git-cherry-pick a builtin, 2007-03-01), and I admit freely that my memory is no longer fresh on the specifics of this patch. Looking at that patch, I think I simply discarded the index because a subsequent code path would spawn the `git merge-recursive` process, which would have changed the index externally. > I know I had to add another discard_cache() in rebase--interactive.c > because it broke something with the submodules, but it does not seems > all that useful now that rebase.c no longer has to fork to use the > sequencer. FWIW I agree. The code is still quite complex at this point, but infinitely more readable (thank you Elijah for taking point on simplifying merge-recursive.c!). So I think that it might be the right point in time to make sure that the index is not re-read and re-discarded over and over again. Thanks, Dscho ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2019-12-27 21:11 ` Alban Gruin 2019-12-27 22:45 ` Elijah Newren @ 2020-01-31 21:23 ` Johannes Schindelin 2020-04-01 11:33 ` Alban Gruin 1 sibling, 1 reply; 11+ messages in thread From: Johannes Schindelin @ 2020-01-31 21:23 UTC (permalink / raw) To: Alban Gruin; +Cc: Elijah Newren, git [-- Attachment #1: Type: text/plain, Size: 8064 bytes --] Hi Alban, On Fri, 27 Dec 2019, Alban Gruin wrote: > Le 01/02/2019 à 07:04, Johannes Schindelin a écrit : > > > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the > > --am backend) and then with --keep-empty to force the interactive backend > > to be used. Here are the best of 10, on my relatively powerful Windows 10 > > laptop, with current `master`. > > > > With regular rebase --am: > > > > 3400.2: rebase on top of a lot of unrelated changes 5.32(0.06+0.15) > > 3400.4: rebase a lot of unrelated changes without split-index 33.08(0.04+0.18) > > 3400.6: rebase a lot of unrelated changes with split-index 30.29(0.03+0.18) > > > > with --keep-empty to force the interactive backend: > > > > 3400.2: rebase on top of a lot of unrelated changes 3.92(0.03+0.18) > > 3400.4: rebase a lot of unrelated changes without split-index 33.92(0.03+0.22) > > 3400.6: rebase a lot of unrelated changes with split-index 38.82(0.03+0.16) > > > > I then changed it to -m to test the current scripted version, trying to > > let it run overnight, but my laptop eventually went to sleep and the tests > > were not even done. I'll let them continue and report back. > > > > My conclusion after seeing these numbers is: the interactive rebase is > > really close to the performance of the --am backend. So to me, it makes a > > total lot of sense to switch --merge over to it, and to make --merge the > > default. We still should investigate why the split-index performance is so > > significantly worse, though. > > > > Ciao, > > Dscho > > > > I investigated a bit on this. From a quick glance at a callgrind trace, > I can see that ce_write_entry() is called 20 601[1] times with `git am', > but 739 802 times with the sequencer when the split-index is enabled. > > For reference, here are the timings, measured on my Linux machine, on a > tmpfs, with git.git as the repo: > > `rebase --am': > > 3400.2: rebase on top of a lot of unrelated changes 0.29(0.24+0.03) > > 3400.4: rebase a lot of unrelated changes without split-index 6.77(6.51+0.22) > > 3400.6: rebase a lot of unrelated changes with split-index 4.43(4.29+0.13) > `rebase --quiet': > > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.02) > > 3400.4: rebase a lot of unrelated changes without split-index 5.60(5.32+0.27) > > 3400.6: rebase a lot of unrelated changes with split-index 5.67(5.40+0.26) > > This comes from two things: > > 1. There is not enough shared entries in the index with the sequencer. > > do_write_index() is called only by do_write_locked_index() with `--am', > but is also called by write_shared_index() with the sequencer once for > every other commit. As the latter is only called by > write_locked_index(), which means that too_many_not_shared_entries() > returns true for the sequencer, but never for `--am'. > > Removing the call to discard_index() in do_pick_commit() (as in the > first attached patch) solve this particular issue, but this would > require a more thorough analysis to see if it is actually safe to do. Indeed. I offered these insights in #git-devel (slightly edited): This `discard_index()` is in an awfully central location. I am rather certain that it would cause problems to just remove it. Looking at `do_merge()`: it explicitly discards and re-reads the index if we had to spawn a `git merge` process (which we do if a strategy other than `recursive` was specified, or if it is an octopus merge). But I am wary of other code paths that might not be as careful. I see that `do_exec()` is similarly careful. One thing I cannot fail to notice: we do not re-read a changed index after running the `prepare-commit-msg` hook, or for that matter, any other hook. That could even be an old regression from the conversion of the interactive rebase to using the sequencer rather than a shell script. Further, `reset_merge()` seems to spawn `git reset --merge` without bothering to re-read the possibly modified index. Its callees are `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`, none of which seem to be careful, either, about checking whether the index was modified in the meantime. Technically, the in-memory index should also be discarded in `apply_autostash()`, but so far, we do not have any callers of that function, I don't think, that wants to do anything but release resources and exit. The `run_git_checkout()` function discards, as intended. I am not quite sure whether it needs to, though, unless the `.git/index` file _was_ modified (it _is_ possible, after all, to run `git rebase -i HEAD`, and I do have a use case for that where one of my scripts generates a todo script, sort of a `git cherry-pick --rebase-merges`, because `cherry-pick` does not support that mode). The `continue_single_pick()` function spawns a `git commit` which could potentially modify the index through a hook, but the first call site does not care and the second one guards against that (erroring out...). My biggest concern is with the `run_git_commit()` function: it does not re-read a potentially-modified index (think of hooks). We will need to be very careful with this `discard_index()`, I think, and in my opinion there is a great opportunity here for cleaning up a little: rather than discarding and re-reading the in-memory index without seeing whether the on-disk index has changed at all appears a bit wasteful to me. This could be refactored into a function that only discards and re-reads the index if the mtime of `.git/index` changed. That function could then also be taught to detect when the in-memory index has unwritten changes: that would constitute a bug. Ciao, Dscho > > After this, ce_write() is still called much more by the sequencer. > > Here are the results of `rebase --quiet' without discarding the index: > > > 3400.2: rebase on top of a lot of unrelated changes 0.23(0.19+0.04) > > 3400.4: rebase a lot of unrelated changes without split-index 5.14(4.95+0.18) > > 3400.6: rebase a lot of unrelated changes with split-index 5.02(4.87+0.15) > The performance of the rebase is better in the two cases. > > > 2. The base index is dropped by unpack_trees_start() and unpack_trees(). > > Now, write_shared_index() is no longer called and write_locked_index() > is less expensive than before according to callgrind. But > ce_write_entry() is still called 749 302 times (which is even more than > before.) > > The only place where ce_write_entry() is called is in a loop in > do_write_index(). The number of iterations is dictated by the size of > the cache, and there is a trace2 probe dumping this value. > > For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, > 4, 4, 5, 5, 5, 5, … up until 101. > > For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698, > 3699, 3699, … up until 3796. > > The size of the cache is set in prepare_to_write_split_index(). It > grows if a cache entry has no index (most of them should have one by > now), or if the split index has no base index (with `--am', the split > index always has a base.) This comes from unpack_trees_start() -- it > creates a new index, and unpack_trees() does not carry the base index, > hence the size of the cache. > > The second attached patch (which is broken for the non-interactive > rebase case) demonstrates what we could expect for the split-index case > if we fix this: > > > 3400.2: rebase on top of a lot of unrelated changes 0.24(0.21+0.03) > > 3400.4: rebase a lot of unrelated changes without split-index 5.81(5.62+0.17) > > 3400.6: rebase a lot of unrelated changes with split-index 4.76(4.54+0.20) > So, for everything related to the index, I think that’s it. > > [1] Numbers may vary, but they should remain in the same order of magnitude. > > Cheers, > Alban > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2020-01-31 21:23 ` Johannes Schindelin @ 2020-04-01 11:33 ` Alban Gruin 2020-04-01 14:00 ` Phillip Wood 0 siblings, 1 reply; 11+ messages in thread From: Alban Gruin @ 2020-04-01 11:33 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Elijah Newren, git Hi Johannes, Sorry for the late answer, I was really busy for the last months. Le 31/01/2020 à 22:23, Johannes Schindelin a écrit : > Hi Alban, > -%<- > > Indeed. I offered these insights in #git-devel (slightly edited): > > This `discard_index()` is in an awfully central location. I am rather > certain that it would cause problems to just remove it. > > Looking at `do_merge()`: it explicitly discards and re-reads the index if > we had to spawn a `git merge` process (which we do if a strategy other > than `recursive` was specified, or if it is an octopus merge). But I am > wary of other code paths that might not be as careful. > > I see that `do_exec()` is similarly careful. > I have to admit that the index is not my area of expertise in git, so sorry if my question is stupid: isn't there a less heavy way to find unstaged or uncommitted changes than discarding and then reloading the index? > One thing I cannot fail to notice: we do not re-read a changed index > after running the `prepare-commit-msg` hook, or for that matter, any other > hook. That could even be an old regression from the conversion of the > interactive rebase to using the sequencer rather than a shell script. > > Further, `reset_merge()` seems to spawn `git reset --merge` without > bothering to re-read the possibly modified index. Its callees are > `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`, > none of which seem to be careful, either, about checking whether the index > was modified in the meantime. > > Technically, the in-memory index should also be discarded > in `apply_autostash()`, but so far, we do not have any callers of that > function, I don't think, that wants to do anything but release resources > and exit. > > The `run_git_checkout()` function discards, as intended. I > am not quite sure whether it needs to, though, unless the `.git/index` > file _was_ modified (it _is_ possible, after all, to run `git rebase -i > HEAD`, and I do have a use case for that where one of my scripts generates > a todo script, sort of a `git cherry-pick --rebase-merges`, because > `cherry-pick` does not support that mode). > > The `continue_single_pick()` function spawns a `git > commit` which could potentially modify the index through a hook, but the > first call site does not care and the second one guards against that > (erroring out...). > > My biggest concern is with the `run_git_commit()` function: it does not > re-read a potentially-modified index (think of hooks). Thank you for your analysis. > > We will need to be very careful with this `discard_index()`, I think, and > in my opinion there is a great opportunity here for cleaning up a little: > rather than discarding and re-reading the in-memory index without seeing > whether the on-disk index has changed at all appears a bit wasteful to me. > > This could be refactored into a function that only discards and re-reads > the index if the mtime of `.git/index` changed. That function could then > also be taught to detect when the in-memory index has unwritten changes: > that would constitute a bug. > Hmm, checking if the mtime of the index to see if it changed isn't racy? Sub-second changes should happen, and to quote a comment in is_racy_stat(), "nanosecond timestamped files can also be racy" -- even if it should not really happen in the case of rebase… > Ciao, > Dscho > Alban ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2020-04-01 11:33 ` Alban Gruin @ 2020-04-01 14:00 ` Phillip Wood 2020-04-04 20:33 ` Johannes Schindelin 0 siblings, 1 reply; 11+ messages in thread From: Phillip Wood @ 2020-04-01 14:00 UTC (permalink / raw) To: Alban Gruin, Johannes Schindelin; +Cc: Elijah Newren, git Hi Alban and Johannes On 01/04/2020 12:33, Alban Gruin wrote: > Hi Johannes, > > Sorry for the late answer, I was really busy for the last months. > > Le 31/01/2020 à 22:23, Johannes Schindelin a écrit : >> Hi Alban, >> -%<- >> >> Indeed. I offered these insights in #git-devel (slightly edited): >> >> This `discard_index()` is in an awfully central location. I am rather >> certain that it would cause problems to just remove it. >> >> Looking at `do_merge()`: it explicitly discards and re-reads the index if >> we had to spawn a `git merge` process (which we do if a strategy other >> than `recursive` was specified, or if it is an octopus merge). But I am >> wary of other code paths that might not be as careful. >> >> I see that `do_exec()` is similarly careful. >> > > I have to admit that the index is not my area of expertise in git, so > sorry if my question is stupid: isn't there a less heavy way to find > unstaged or uncommitted changes than discarding and then reloading the > index? > >> One thing I cannot fail to notice: we do not re-read a changed index >> after running the `prepare-commit-msg` hook, or for that matter, any other >> hook. That could even be an old regression from the conversion of the >> interactive rebase to using the sequencer rather than a shell script. >> >> Further, `reset_merge()` seems to spawn `git reset --merge` without >> bothering to re-read the possibly modified index. Its callees are >> `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`, >> none of which seem to be careful, either, about checking whether the index >> was modified in the meantime. >> >> Technically, the in-memory index should also be discarded >> in `apply_autostash()`, but so far, we do not have any callers of that >> function, I don't think, that wants to do anything but release resources >> and exit. >> >> The `run_git_checkout()` function discards, as intended. I >> am not quite sure whether it needs to, though, unless the `.git/index` >> file _was_ modified (it _is_ possible, after all, to run `git rebase -i >> HEAD`, and I do have a use case for that where one of my scripts generates >> a todo script, sort of a `git cherry-pick --rebase-merges`, because >> `cherry-pick` does not support that mode). I'm not sure it is worth optimizing the case where .git/index is not changed as we only do this once per rebase. In any case I hope that one day we'll stop forking git checkout and use the code from builtin/rebase.c to do it >> The `continue_single_pick()` function spawns a `git >> commit` which could potentially modify the index through a hook, but the >> first call site does not care and the second one guards against that >> (erroring out...). >> >> My biggest concern is with the `run_git_commit()` function: it does not >> re-read a potentially-modified index (think of hooks). I agree that we should be re-reading the index after forking `git commit` and also `git merge`. Most of the time we commit without forking so that should not impact the performance too much > Thank you for your analysis. > >> >> We will need to be very careful with this `discard_index()`, I think, and >> in my opinion there is a great opportunity here for cleaning up a little: >> rather than discarding and re-reading the in-memory index without seeing >> whether the on-disk index has changed at all appears a bit wasteful to me. >> >> This could be refactored into a function that only discards and re-reads >> the index if the mtime of `.git/index` changed. That function could then >> also be taught to detect when the in-memory index has unwritten changes: >> that would constitute a bug. >> > > Hmm, checking if the mtime of the index to see if it changed isn't racy? > Sub-second changes should happen, and to quote a comment in > is_racy_stat(), "nanosecond timestamped files can also be racy" -- even > if it should not really happen in the case of rebase… I don't think relying on the index stat data is a good idea, git defaults to one second mtime resolution unless it is built with -DUSE_NSEC and we do way more than one commit a second. We tried to rely on stat data to determine when to re-read the todo list after an exec and it is broken (both in the design because it assumes ns mtime resolution and the implementation because we don't update the cached mtime after we rewrite the todo list). There are not that many places where we need to re-read the index so I think we should just have explicit re-reads where we need them. Hopefully over time we'll stop forking other processes and the problem will go away. Best Wishes Phillip >> Ciao, >> Dscho >> > > Alban > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Comparing rebase --am with --interactive via p3400 2020-04-01 14:00 ` Phillip Wood @ 2020-04-04 20:33 ` Johannes Schindelin 0 siblings, 0 replies; 11+ messages in thread From: Johannes Schindelin @ 2020-04-04 20:33 UTC (permalink / raw) To: phillip.wood; +Cc: Alban Gruin, Elijah Newren, git [-- Attachment #1: Type: text/plain, Size: 5317 bytes --] Hi Phillip, On Wed, 1 Apr 2020, Phillip Wood wrote: > On 01/04/2020 12:33, Alban Gruin wrote: > > Hi Johannes, > > > > Sorry for the late answer, I was really busy for the last months. > > > > Le 31/01/2020 à 22:23, Johannes Schindelin a écrit : > > > Hi Alban, > > > -%<- > > > > > > Indeed. I offered these insights in #git-devel (slightly edited): > > > > > > This `discard_index()` is in an awfully central location. I am rather > > > certain that it would cause problems to just remove it. > > > > > > Looking at `do_merge()`: it explicitly discards and re-reads the index if > > > we had to spawn a `git merge` process (which we do if a strategy other > > > than `recursive` was specified, or if it is an octopus merge). But I am > > > wary of other code paths that might not be as careful. > > > > > > I see that `do_exec()` is similarly careful. > > > > > > > I have to admit that the index is not my area of expertise in git, so > > sorry if my question is stupid: isn't there a less heavy way to find > > unstaged or uncommitted changes than discarding and then reloading the > > index? > > > > > One thing I cannot fail to notice: we do not re-read a changed index > > > after running the `prepare-commit-msg` hook, or for that matter, any other > > > hook. That could even be an old regression from the conversion of the > > > interactive rebase to using the sequencer rather than a shell script. > > > > > > Further, `reset_merge()` seems to spawn `git reset --merge` without > > > bothering to re-read the possibly modified index. Its callees are > > > `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`, > > > none of which seem to be careful, either, about checking whether the index > > > was modified in the meantime. > > > > > > Technically, the in-memory index should also be discarded > > > in `apply_autostash()`, but so far, we do not have any callers of that > > > function, I don't think, that wants to do anything but release resources > > > and exit. > > > > > > The `run_git_checkout()` function discards, as intended. I > > > am not quite sure whether it needs to, though, unless the `.git/index` > > > file _was_ modified (it _is_ possible, after all, to run `git rebase -i > > > HEAD`, and I do have a use case for that where one of my scripts generates > > > a todo script, sort of a `git cherry-pick --rebase-merges`, because > > > `cherry-pick` does not support that mode). > > I'm not sure it is worth optimizing the case where .git/index is not changed > as we only do this once per rebase. In any case I hope that one day we'll stop > forking git checkout and use the code from builtin/rebase.c to do it > > > > The `continue_single_pick()` function spawns a `git > > > commit` which could potentially modify the index through a hook, but the > > > first call site does not care and the second one guards against that > > > (erroring out...). > > > > > > My biggest concern is with the `run_git_commit()` function: it does not > > > re-read a potentially-modified index (think of hooks). > > I agree that we should be re-reading the index after forking `git commit` and > also `git merge`. Most of the time we commit without forking so that should > not impact the performance too much > > > Thank you for your analysis. > > > > > > > > We will need to be very careful with this `discard_index()`, I think, and > > > in my opinion there is a great opportunity here for cleaning up a little: > > > rather than discarding and re-reading the in-memory index without seeing > > > whether the on-disk index has changed at all appears a bit wasteful to me. > > > > > > This could be refactored into a function that only discards and re-reads > > > the index if the mtime of `.git/index` changed. That function could then > > > also be taught to detect when the in-memory index has unwritten changes: > > > that would constitute a bug. > > > > > > > Hmm, checking if the mtime of the index to see if it changed isn't racy? > > Sub-second changes should happen, and to quote a comment in > > is_racy_stat(), "nanosecond timestamped files can also be racy" -- even > > if it should not really happen in the case of rebase… > > I don't think relying on the index stat data is a good idea, git defaults to > one second mtime resolution unless it is built with -DUSE_NSEC and we do way > more than one commit a second. We tried to rely on stat data to determine when > to re-read the todo list after an exec and it is broken (both in the design > because it assumes ns mtime resolution and the implementation because we don't > update the cached mtime after we rewrite the todo list). There are not that > many places where we need to re-read the index so I think we should just have > explicit re-reads where we need them. Hopefully over time we'll stop forking > other processes and the problem will go away. Well. Even the 1-second granularity should buy us some performance if we assume that `same mtime` == `racy`. That should still catch the majority of the cases where the index was simply not changed, at least in the `do_exec()` case. Ciao, Dscho > > Best Wishes > > Phillip > > > > Ciao, > > > Dscho > > > > > > > Alban > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-04-04 20:33 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-01 6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin 2019-02-01 7:22 ` Johannes Schindelin 2019-02-01 9:26 ` Elijah Newren 2019-12-27 21:11 ` Alban Gruin 2019-12-27 22:45 ` Elijah Newren 2019-12-29 17:25 ` Alban Gruin 2020-01-02 20:17 ` Johannes Schindelin 2020-01-31 21:23 ` Johannes Schindelin 2020-04-01 11:33 ` Alban Gruin 2020-04-01 14:00 ` Phillip Wood 2020-04-04 20:33 ` Johannes Schindelin
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).