git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Alban Gruin <alban.gruin@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Comparing rebase --am with --interactive via p3400
Date: Fri, 27 Dec 2019 22:11:36 +0100	[thread overview]
Message-ID: <a00e1689-ec7c-4039-a2e9-f72d452ae4ff@gmail.com> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1901312310280.41@tvgsbejvaqbjf.bet>

[-- Attachment #1: Type: text/plain, Size: 5067 bytes --]

Hi Johannes & Elijah,

Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
> Hi Elijah,
> 
> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
> --am backend) and then with --keep-empty to force the interactive backend
> to be used. Here are the best of 10, on my relatively powerful Windows 10
> laptop, with current `master`.
> 
> With regular rebase --am:
> 
> 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
> 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
> 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
> 
> with --keep-empty to force the interactive backend:
> 
> 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
> 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
> 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
> 
> I then changed it to -m to test the current scripted version, trying to
> let it run overnight, but my laptop eventually went to sleep and the tests
> were not even done. I'll let them continue and report back.
> 
> My conclusion after seeing these numbers is: the interactive rebase is
> really close to the performance of the --am backend. So to me, it makes a
> total lot of sense to switch --merge over to it, and to make --merge the
> default. We still should investigate why the split-index performance is so
> significantly worse, though.
> 
> Ciao,
> Dscho
> 

I investigated a bit on this.  From a quick glance at a callgrind trace,
I can see that ce_write_entry() is called 20 601[1] times with `git am',
but 739 802 times with the sequencer when the split-index is enabled.

For reference, here are the timings, measured on my Linux machine, on a
tmpfs, with git.git as the repo:

`rebase --am':
> 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
> 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
> 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
`rebase --quiet':
> 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
> 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
> 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)

This comes from two things:

1. There is not enough shared entries in the index with the sequencer.

do_write_index() is called only by do_write_locked_index() with `--am',
but is also called by write_shared_index() with the sequencer once for
every other commit.  As the latter is only called by
write_locked_index(), which means that too_many_not_shared_entries()
returns true for the sequencer, but never for `--am'.

Removing the call to discard_index() in do_pick_commit() (as in the
first attached patch) solve this particular issue, but this would
require a more thorough analysis to see if it is actually safe to do.

After this, ce_write() is still called much more by the sequencer.

Here are the results of `rebase --quiet' without discarding the index:

> 3400.2: rebase on top of a lot of unrelated changes             0.23(0.19+0.04)
> 3400.4: rebase a lot of unrelated changes without split-index   5.14(4.95+0.18)
> 3400.6: rebase a lot of unrelated changes with split-index      5.02(4.87+0.15)
The performance of the rebase is better in the two cases.


2. The base index is dropped by unpack_trees_start() and unpack_trees().

Now, write_shared_index() is no longer called and write_locked_index()
is less expensive than before according to callgrind.  But
ce_write_entry() is still called 749 302 times (which is even more than
before.)

The only place where ce_write_entry() is called is in a loop in
do_write_index().  The number of iterations is dictated by the size of
the cache, and there is a trace2 probe dumping this value.

For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,
4, 4, 5, 5, 5, 5, … up until 101.

For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698,
3699, 3699, … up until 3796.

The size of the cache is set in prepare_to_write_split_index().  It
grows if a cache entry has no index (most of them should have one by
now), or if the split index has no base index (with `--am', the split
index always has a base.)  This comes from unpack_trees_start() -- it
creates a new index, and unpack_trees() does not carry the base index,
hence the size of the cache.

The second attached patch (which is broken for the non-interactive
rebase case) demonstrates what we could expect for the split-index case
if we fix this:

> 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.03)
> 3400.4: rebase a lot of unrelated changes without split-index   5.81(5.62+0.17)
> 3400.6: rebase a lot of unrelated changes with split-index      4.76(4.54+0.20)
So, for everything related to the index, I think that’s it.

[1] Numbers may vary, but they should remain in the same order of magnitude.

Cheers,
Alban


[-- Attachment #2: sequencer-rebase-si.patch --]
[-- Type: text/x-patch, Size: 317 bytes --]

diff --git a/sequencer.c b/sequencer.c
index 1bee26ebd5..2831abd0fa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1863,7 +1863,6 @@ static int do_pick_commit(struct repository *r,
 				       NULL, 0))
 			return error_dirty_index(r, opts);
 	}
-	discard_index(r->index);
 
 	if (!commit->parents)
 		parent = NULL;

[-- Attachment #3: merge-recursive-rebase-si.patch --]
[-- Type: text/x-patch, Size: 1367 bytes --]

diff --git a/merge-recursive.c b/merge-recursive.c
index 11869ad81c..47f67079f3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -421,7 +421,7 @@ static int unpack_trees_start(struct merge_options *opt,
 {
 	int rc;
 	struct tree_desc t[3];
-	struct index_state tmp_index = { NULL };
+	/* struct index_state tmp_index = { NULL }; */
 
 	memset(&opt->priv->unpack_opts, 0, sizeof(opt->priv->unpack_opts));
 	if (opt->priv->call_depth)
@@ -432,7 +432,7 @@ static int unpack_trees_start(struct merge_options *opt,
 	opt->priv->unpack_opts.head_idx = 2;
 	opt->priv->unpack_opts.fn = threeway_merge;
 	opt->priv->unpack_opts.src_index = opt->repo->index;
-	opt->priv->unpack_opts.dst_index = &tmp_index;
+	opt->priv->unpack_opts.dst_index = opt->repo->index;
 	opt->priv->unpack_opts.aggressive = !merge_detect_rename(opt);
 	setup_unpack_trees_porcelain(&opt->priv->unpack_opts, "merge");
 
@@ -449,8 +449,8 @@ static int unpack_trees_start(struct merge_options *opt,
 	 * saved copy.  (verify_uptodate() checks src_index, and the original
 	 * index is the one that had the necessary modification timestamps.)
 	 */
-	opt->priv->orig_index = *opt->repo->index;
-	*opt->repo->index = tmp_index;
+	/* opt->priv->orig_index = *opt->repo->index; */
+	/* *opt->repo->index = tmp_index; */
 	opt->priv->unpack_opts.src_index = &opt->priv->orig_index;
 
 	return rc;

  parent reply	other threads:[~2019-12-27 21:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-01  6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin
2019-02-01  7:22 ` Johannes Schindelin
2019-02-01  9:26 ` Elijah Newren
2019-12-27 21:11 ` Alban Gruin [this message]
2019-12-27 22:45   ` Elijah Newren
2019-12-29 17:25     ` Alban Gruin
2020-01-02 20:17       ` Johannes Schindelin
2020-01-31 21:23   ` Johannes Schindelin
2020-04-01 11:33     ` Alban Gruin
2020-04-01 14:00       ` Phillip Wood
2020-04-04 20:33         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a00e1689-ec7c-4039-a2e9-f72d452ae4ff@gmail.com \
    --to=alban.gruin@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).