[PATCH 00/11] perf improvements past v1.7.10

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 00/11] perf improvements past v1.7.10
@ 2012-03-12 15:09 Thomas Rast
  2012-03-12 15:09 ` [PATCH 01/11] perf/aggregate: load Git.pm from the build tree Thomas Rast
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:09 UTC (permalink / raw
  To: git

This is what I have collected over a week of playing with the perf
suite.  There is no rush; I am just sending them out for anyone
interested.  If you plan on using the perf suite, I would still
suggest you apply them.

Aside from the new tests, the main goal is the bisection script.
[8/11] is fun too.  Try it!  You will have to install GNU R however.

Thomas Rast (11):
  perf/aggregate: load Git.pm from the build tree
  Introduce a performance test for git-rebase
  Introduce a performance test for git-blame
  perf: display average instead of minimum time
  perf: suppress aggregation also in 'run'
  perf: dereference to a commit when building
  perf: convert realtime to seconds when collecting runs
  perf/aggregate: optionally include a t-test score
  perf/run: allow skipping some revisions
  perf: implement a test-selection feature
  perf: add a bisection tool

 t/perf/README          |    8 +++--
 t/perf/aggregate.perl  |   65 ++++++++++++++++++++++++++--------
 t/perf/bisect_slowdown |   88 ++++++++++++++++++++++++++++++++++++++++++++++
 t/perf/min_time.perl   |   21 -----------
 t/perf/p3400-rebase.sh |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
 t/perf/p8002-blame.sh  |   46 ++++++++++++++++++++++++
 t/perf/perf-lib.sh     |   12 +++++--
 t/perf/run             |   10 ++++--
 t/perf/t_test_score.sh |   24 +++++++++++++
 9 files changed, 322 insertions(+), 43 deletions(-)
 create mode 100755 t/perf/bisect_slowdown
 delete mode 100755 t/perf/min_time.perl
 create mode 100755 t/perf/p3400-rebase.sh
 create mode 100755 t/perf/p8002-blame.sh
 create mode 100755 t/perf/t_test_score.sh

-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/11] perf/aggregate: load Git.pm from the build tree
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
@ 2012-03-12 15:09 ` Thomas Rast
  2012-03-12 15:09 ` [PATCH 02/11] Introduce a performance test for git-rebase Thomas Rast
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:09 UTC (permalink / raw
  To: git

As pointed out by Junio, aggregate.perl is not careful enough about
its loading of Git.pm, using the one that is installed on the system.
Load the version from the build tree, using FindBin as proposed by
Jakub.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/aggregate.perl |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 15f7fc1..c0afa0b 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -2,6 +2,9 @@
 
 use strict;
 use warnings;
+use FindBin;
+use lib "$FindBin::Bin/../../perl/blib/lib",
+	"$FindBin::Bin/../../perl/blib/arch/auto/Git";
 use Git;
 
 sub get_times {
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/11] Introduce a performance test for git-rebase
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
  2012-03-12 15:09 ` [PATCH 01/11] perf/aggregate: load Git.pm from the build tree Thomas Rast
@ 2012-03-12 15:09 ` Thomas Rast
  2012-03-12 17:41   ` Thomas Rast
  2012-03-12 15:09 ` [PATCH 03/11] Introduce a performance test for git-blame Thomas Rast
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:09 UTC (permalink / raw
  To: git

The perf framework lets the user run the test in an arbitrary repo,
which makes writing a test for rebase a bit finicky.  This one uses a
perl script to determine a longest linear segment of history, that is,
a range A..B which is completely linear.  (For a full clone of
git.git, this is the (whole) 'html' branch.)  The number of commits
that are rebased is capped at 50, however.  That is, if A..B is more
than 50 commits, it uses B~50..B instead.

Having determined a suitable range, it then runs rebase with all
combinations of rerere (on/off), -i / noninteractive, and -m / normal.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/p3400-rebase.sh |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100755 t/perf/p3400-rebase.sh

diff --git a/t/perf/p3400-rebase.sh b/t/perf/p3400-rebase.sh
new file mode 100755
index 0000000..7155574
--- /dev/null
+++ b/t/perf/p3400-rebase.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description="Tests git-rebase performance"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'find a long run of linear history' '
+	git rev-list --topo-order --parents --all |
+	perl -e '\''$maxL = 0; $maxcommit = undef;
+		while (<>) {
+			chomp;
+			@parents = split;
+			$commit = shift @parents;
+			if ($L{$commit} > $maxL) {
+				$maxL = $L{$commit};
+				$maxcommit = $commit;
+			}
+			if (1 == scalar @parents
+				&& $L{$commit} >= $L{$parents[0]}) {
+				$L{$parents[0]} = $L{$commit}+1;
+				$C{$parents[0]} = $commit;
+			}
+		}
+		$cur = $maxcommit;
+		while (defined $C{$cur}) {
+			$cur = $C{$cur};
+		}
+		if ($maxL > 50) {
+			$maxL = 50;
+		}
+		print "$cur~$maxL\n$cur\n";
+	'\'' >rebase-args &&
+	base_rev=$(sed -n 1p rebase-args) &&
+	tip_rev=$(sed -n 2p rebase-args) &&
+	git checkout -f -b work $tip_rev
+'
+
+export base_rev tip_rev
+
+test_expect_success 'verify linearity' '
+	git rev-list --parents $base_rev.. >list &&
+	! grep "[0-9a-f]+ [0-9a-f]+ [0-9a-f+].*" list
+'
+
+test_expect_success 'disable rerere' '
+	git config rerere.enabled false
+'
+
+test_perf 'rebase -f (rerere off)' '
+	git rebase -f $base_rev
+'
+
+test_perf 'rebase -m -f (rerere off)' '
+	git rebase -m -f $base_rev
+'
+
+test_perf 'rebase -i -f (rerere off)' '
+	GIT_EDITOR=: git rebase -i -f $base_rev
+'
+
+test_perf 'rebase -i -m -f (rerere off)' '
+	GIT_EDITOR=: git rebase -i -m -f $base_rev
+'
+
+test_expect_success 'enable rerere and prime it' '
+	git config rerere.enabled true &&
+	git rebase -f $base_rev &&
+	git rebase -m -f $base_rev &&
+	GIT_EDITOR=: git rebase -i -f $base_rev &&
+	GIT_EDITOR=: git rebase -i -m -f $base_rev
+'
+
+test_perf 'rebase -f (rerere ON)' '
+	git rebase -f $base_rev
+'
+
+test_perf 'rebase -m -f (rerere ON)' '
+	git rebase -m -f $base_rev
+'
+
+test_perf 'rebase -i -f (rerere ON)' '
+	GIT_EDITOR=: git rebase -i -f $base_rev
+'
+
+test_perf 'rebase -i -m -f (rerere ON)' '
+	GIT_EDITOR=: git rebase -i -m -f $base_rev
+'
+
+test_done
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/11] Introduce a performance test for git-blame
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
  2012-03-12 15:09 ` [PATCH 01/11] perf/aggregate: load Git.pm from the build tree Thomas Rast
  2012-03-12 15:09 ` [PATCH 02/11] Introduce a performance test for git-rebase Thomas Rast
@ 2012-03-12 15:09 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 04/11] perf: display average instead of minimum time Thomas Rast
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:09 UTC (permalink / raw
  To: git

It blames a pseudo-randomly chosen (but fixed as long as the tested
repository remains the same) set of 10 files with various options.
The choice of 10 is arbitrary, but gives a nice runtime with git.git.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/p8002-blame.sh |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)
 create mode 100755 t/perf/p8002-blame.sh

diff --git a/t/perf/p8002-blame.sh b/t/perf/p8002-blame.sh
new file mode 100755
index 0000000..000b9d2
--- /dev/null
+++ b/t/perf/p8002-blame.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description="Tests blame performance"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# Pick 10 files to blame pseudo-randomly.  The sort key is the blob
+# hash, so it is stable.
+test_expect_success 'setup' '
+	git ls-tree HEAD | grep ^100644 |
+	sort -k 3 | head | cut -f 2 >filelist
+'
+
+test_perf 'blame' '
+	while read -r name; do
+		git blame HEAD -- "$name" >/dev/null
+	done <filelist
+'
+
+test_perf 'blame -M' '
+	while read -r name; do
+		git blame -M HEAD -- "$name" >/dev/null
+	done <filelist
+'
+
+test_perf 'blame -C' '
+	while read -r name; do
+		git blame -C HEAD -- "$name" >/dev/null
+	done <filelist
+'
+
+test_perf 'blame -C -C' '
+	while read -r name; do
+		git blame -C -C HEAD -- "$name" >/dev/null
+	done <filelist
+'
+
+test_perf 'blame -C -C -C' '
+	while read -r name; do
+		git blame -C -C -C HEAD -- "$name" >/dev/null
+	done <filelist
+'
+
+test_done
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/11] perf: display average instead of minimum time
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (2 preceding siblings ...)
  2012-03-12 15:09 ` [PATCH 03/11] Introduce a performance test for git-blame Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 05/11] perf: suppress aggregation also in 'run' Thomas Rast
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

The perf tests initially only saved the minimum measurement (chosen
according to wall time).  This is common for timings, as it tends to
measure the shortest time in which the machine can run the test.  It
is also a bit more forgiving for the user if he ran some other task
during one of the tests.

However, experiments with p3400-rebase.sh showed that for
longer-running tasks with a lot of kernel involvement (such as a shell
script and its constant forking) the variance is so high that the
minimum becomes extremely unstable.  The best-of-10 (!) timings
fluctuated by around 5% in extreme cases.

Switch to the average, as that tends to cancel out the variance for
sufficiently large numbers of $GIT_PERF_REPEAT_COUNT.  The catch is
that now we can no longer write off the initial (possibly) cache-cold
timing.  So the test is now run 1+n times, and the first run is
discarded.

Since we're already rewriting that paragraph, also correctly state the
default value of GIT_PERF_REPEAT_COUNT, which is 3.

The shift of the averaging logic to aggregate.perl is not, strictly
speaking, necessary.  However, min_time.perl would have had to be
renamed anyway.  It also sets the stage for the next patch.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/README         |    8 +++++---
 t/perf/aggregate.perl |   25 +++++++++++++++++--------
 t/perf/min_time.perl  |   21 ---------------------
 t/perf/perf-lib.sh    |    5 +++--
 4 files changed, 25 insertions(+), 34 deletions(-)
 delete mode 100755 t/perf/min_time.perl

diff --git a/t/perf/README b/t/perf/README
index b2dbad4..fa94eb5 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -54,9 +54,11 @@ anything beforehand.
 
 You can set the following variables (also in your config.mak):
 
-    GIT_PERF_REPEAT_COUNT
-	Number of times a test should be repeated for best-of-N
-	measurements.  Defaults to 5.
+    GIT_PERF_REPEAT_COUNT='n'
+	Number of times a test should be repeated.  The test is run
+	'n+1' times: once to warm up the caches and 'n' more times to
+	gather the measurements.  The first run is discarded, and the
+	other 'n' runs are averaged.  Defaults to 3.
 
     GIT_PERF_MAKE_OPTS
 	Options to use when automatically building a git tree for
diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index c0afa0b..9c781fa 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -10,13 +10,22 @@
 sub get_times {
 	my $name = shift;
 	open my $fh, "<", $name or return undef;
-	my $line = <$fh>;
-	return undef if not defined $line;
+	my $sum_rt = 0.0;
+	my $sum_u = 0.0;
+	my $sum_s = 0.0;
+	my $n = 0;
+	while (<$fh>) {
+		/^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/
+			or die "bad input line: $_";
+		my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
+		$sum_rt += $rt;
+		$sum_u += $4;
+		$sum_s += $5;
+		$n++;
+	}
+	return undef if !$n;
 	close $fh or die "cannot close $name: $!";
-	$line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/
-		or die "bad input line: $line";
-	my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
-	return ($rt, $4, $5);
+	return ($sum_rt/$n, $sum_u/$n, $sum_s/$n);
 }
 
 sub format_times {
@@ -140,8 +149,8 @@ sub have_slash {
 	for my $i (0..$#dirs) {
 		my $d = $dirs[$i];
 		$times{$prefixes{$d}.$t} = [get_times("test-results/$prefixes{$d}$t.times")];
-		my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-		my $w = length format_times($r,$u,$s,$firstr);
+		my ($r,$u,$s,$sign) = @{$times{$prefixes{$d}.$t}};
+		my $w = length format_times($r,$u,$s,$sign,$firstr);
 		$colwidth[$i] = $w if $w > $colwidth[$i];
 		$firstr = $r unless defined $firstr;
 	}
diff --git a/t/perf/min_time.perl b/t/perf/min_time.perl
deleted file mode 100755
index c1a2717..0000000
--- a/t/perf/min_time.perl
+++ /dev/null
@@ -1,21 +0,0 @@
-#!/usr/bin/perl
-
-my $minrt = 1e100;
-my $min;
-
-while (<>) {
-	# [h:]m:s.xx U.xx S.xx
-	/^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/
-		or die "bad input line: $_";
-	my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
-	if ($rt < $minrt) {
-		$min = $_;
-		$minrt = $rt;
-	}
-}
-
-if (!defined $min) {
-	die "no input found";
-}
-
-print $min;
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index 5580c22..a13f105 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -163,7 +163,7 @@ test_perf () {
 		else
 			echo "perf $test_count - $1:"
 		fi
-		for i in $(seq 1 $GIT_PERF_REPEAT_COUNT); do
+		for i in $(seq 0 $GIT_PERF_REPEAT_COUNT); do
 			say >&3 "running: $2"
 			if test_run_perf_ "$2"
 			then
@@ -184,7 +184,8 @@ test_perf () {
 			test_ok_ "$1"
 		fi
 		base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" .sh)"."$test_count"
-		"$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times
+		rm test_time.0
+		cat test_time.* >"$base".times
 	fi
 	echo >&3 ""
 }
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/11] perf: suppress aggregation also in 'run'
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (3 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 04/11] perf: display average instead of minimum time Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 06/11] perf: dereference to a commit when building Thomas Rast
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

perf-lib.sh avoids calling aggregate.perl on the results if
$GIT_PERF_AGGREGATING_LATER is set.  'run' uses this to suppress the
table for test runs, so that it can display them all together at the
end.

However, other users may want to do the same while still benefiting
from run's facilities such as automatic compilation/testing of
arbitrary revisions.  So teach 'run' the same interface: when
GIT_PERF_AGGREGATING_LATER is set, do not call aggregate.perl.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/run |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/perf/run b/t/perf/run
index cfd7012..886290b 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -69,6 +69,7 @@ run_dirs () {
 	done
 }
 
+orig_GIT_AGGREGATING_LATER=$GIT_PERF_AGGREGATING_LATER
 GIT_PERF_AGGREGATING_LATER=t
 export GIT_PERF_AGGREGATING_LATER
 
@@ -79,4 +80,4 @@ if test $# = 0 -o "$1" = -- -o -f "$1"; then
 	set -- . "$@"
 fi
 run_dirs "$@"
-./aggregate.perl "$@"
+test -z "$orig_GIT_AGGREGATING_LATER" && ./aggregate.perl "$@"
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/11] perf: dereference to a commit when building
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (4 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 05/11] perf: suppress aggregation also in 'run' Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 07/11] perf: convert realtime to seconds when collecting runs Thomas Rast
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

So far the automatic compilation in 'run' just ran rev-parse on the
user input.  This means the user could test once for v1.7.9 and once
for 828ea97 (Git 1.7.9, 2012-01-27) and the script would never realize
that they are actually the same.  Even more confusingly, the user
could test 828ea97d and ask about './aggregate.perl v1.7.9' and it
would not find any results.

Dereference to a commit when constructing the directory name to avoid
this.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/aggregate.perl |    2 +-
 t/perf/run            |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 9c781fa..4586840 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -52,7 +52,7 @@ sub format_times {
 	my $dir;
 	last if -f $arg or $arg eq "--";
 	if (! -d $arg) {
-		my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
+		my $rev = Git::command_oneline(qw(rev-parse --verify), $arg."^{commit}");
 		$dir = "build/".$rev;
 	} else {
 		$arg =~ s{/*$}{};
diff --git a/t/perf/run b/t/perf/run
index 886290b..e4f9c22 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -45,7 +45,7 @@ run_dirs_helper () {
 		shift
 	fi
 	if [ ! -d "$mydir" ]; then
-		rev=$(git rev-parse --verify "$mydir" 2>/dev/null) ||
+		rev=$(git rev-parse --verify "$mydir^{commit}" 2>/dev/null) ||
 		die "'$mydir' is neither a directory nor a valid revision"
 		if [ ! -d build/$rev ]; then
 			unpack_git_rev $rev
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/11] perf: convert realtime to seconds when collecting runs
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (5 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 06/11] perf: dereference to a commit when building Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 08/11] perf/aggregate: optionally include a t-test score Thomas Rast
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

Our use of 'time' gives results along the lines of

  0:01.04 0.67 0.02

That is, the realtime field is formatted as a H:M:S.CC time with the
hour field optional.  Use a little perl magic to convert it to
seconds, too.  This makes it easier to parse further downstream.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/aggregate.perl |    9 ++++-----
 t/perf/perf-lib.sh    |    3 ++-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 4586840..b04d3a0 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -15,12 +15,11 @@ sub get_times {
 	my $sum_s = 0.0;
 	my $n = 0;
 	while (<$fh>) {
-		/^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/
+		/^(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/
 			or die "bad input line: $_";
-		my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
-		$sum_rt += $rt;
-		$sum_u += $4;
-		$sum_s += $5;
+		$sum_rt += $1;
+		$sum_u += $2;
+		$sum_s += $3;
 		$n++;
 	}
 	return undef if !$n;
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index a13f105..bfc2926 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -185,7 +185,8 @@ test_perf () {
 		fi
 		base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" .sh)"."$test_count"
 		rm test_time.0
-		cat test_time.* >"$base".times
+		perl -pe 's/(?:(\d+):)?(\d+):(\d+(?:\.\d+)?)/((defined $1?$1:0)*60+$2)*60+$3/e' \
+			test_time.* >"$base".times
 	fi
 	echo >&3 ""
 }
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/11] perf/aggregate: optionally include a t-test score
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (6 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 07/11] perf: convert realtime to seconds when collecting runs Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:10 ` [PATCH 09/11] perf/run: allow skipping some revisions Thomas Rast
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

Adds support for calling out to R[1] to perform a significance test on
the gathered results.

More specifically, the script's purpose was always to compare several
revisions (it still makes nice tables for single tests though ;-).
Given the first and some other column (= revision/directory), this
runs Welch's t-test[2] on the two sets of measurements to determine
whether there is a significant difference between the distributions.
It then shows the p-value in a simplified form, so that significant
differences stand out optically.

All of this is entirely optional: if R is not available, it simply
puts nothing in this field.

[1] http://www.r-project.org/
[2] http://en.wikipedia.org/wiki/Welch%27s_t-test

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/aggregate.perl  |   37 +++++++++++++++++++++++++++++++------
 t/perf/t_test_score.sh |   24 ++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 6 deletions(-)
 create mode 100755 t/perf/t_test_score.sh

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index b04d3a0..4db685d 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -7,8 +7,12 @@
 	"$FindBin::Bin/../../perl/blib/arch/auto/Git";
 use Git;
 
+my $any_sign_printed = 0;
+
 sub get_times {
 	my $name = shift;
+	my $firstset = shift;
+	my $sig = "";
 	open my $fh, "<", $name or return undef;
 	my $sum_rt = 0.0;
 	my $sum_u = 0.0;
@@ -24,11 +28,21 @@ sub get_times {
 	}
 	return undef if !$n;
 	close $fh or die "cannot close $name: $!";
-	return ($sum_rt/$n, $sum_u/$n, $sum_s/$n);
+	if (defined $firstset &&
+	    open my $ph, "-|", "./t_test_score.sh $name $firstset 2>/dev/null") {
+		my $result = <$ph>;
+		close $ph or die "cannot close pipe to t_test_score.sh: $!";
+		chomp $result;
+		$sig = $result;
+		if ($sig ne "") {
+			$any_sign_printed = 1;
+		}
+	}
+	return ($sum_rt/$n, $sum_u/$n, $sum_s/$n, $sig);
 }
 
 sub format_times {
-	my ($r, $u, $s, $firstr) = @_;
+	my ($r, $u, $s, $sign, $firstr) = @_;
 	if (!defined $r) {
 		return "<missing>";
 	}
@@ -41,6 +55,7 @@ sub format_times {
 		} else {
 			$out .= " +inf";
 		}
+		$out .= $sign;
 	}
 	return $out;
 }
@@ -145,13 +160,17 @@ sub have_slash {
 }
 for my $t (@subtests) {
 	my $firstr;
+	my $firstset;
 	for my $i (0..$#dirs) {
 		my $d = $dirs[$i];
-		$times{$prefixes{$d}.$t} = [get_times("test-results/$prefixes{$d}$t.times")];
+		$times{$prefixes{$d}.$t} = [get_times("test-results/$prefixes{$d}$t.times", $firstset)];
 		my ($r,$u,$s,$sign) = @{$times{$prefixes{$d}.$t}};
 		my $w = length format_times($r,$u,$s,$sign,$firstr);
 		$colwidth[$i] = $w if $w > $colwidth[$i];
-		$firstr = $r unless defined $firstr;
+		if (!defined $firstr) {
+			$firstr = $r;
+			$firstset = "test-results/$prefixes{$d}$t.times";
+		}
 	}
 }
 my $totalwidth = 3*@dirs+$descrlen;
@@ -169,9 +188,15 @@ sub have_slash {
 	my $firstr;
 	for my $i (0..$#dirs) {
 		my $d = $dirs[$i];
-		my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-		printf "   %-$colwidth[$i]s", format_times($r,$u,$s,$firstr);
+		my ($r,$u,$s,$sign) = @{$times{$prefixes{$d}.$t}};
+		printf "   %-$colwidth[$i]s", format_times($r,$u,$s,$sign,$firstr);
 		$firstr = $r unless defined $firstr;
 	}
 	print "\n";
 }
+
+if ($any_sign_printed) {
+	print "-"x$totalwidth, "\n";
+	print "Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001\n"
+}
+
diff --git a/t/perf/t_test_score.sh b/t/perf/t_test_score.sh
new file mode 100755
index 0000000..32353d6
--- /dev/null
+++ b/t/perf/t_test_score.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+# If the user doesn't have R, we don't care
+
+command -v R >/dev/null || exit 0
+
+# Uses R to run a t-test on the hypothesis that the elapsed time
+# values in $1 are less than the ones in $2.
+
+pvalue=$(R --no-save --slave <<-EOF
+	a <- read.table("$1")
+	b <- read.table("$2")
+	tst <- t.test(a\$V1, b\$V1)
+	p <- tst\$p.value
+	if (p<0.001) print ("***") \
+	else if (p<0.01) print ("**") \
+	else if (p<0.05) print ("*") \
+	else if (p<0.1) print (".")
+EOF
+)
+
+pvalue=${pvalue##\[1\] \"}
+pvalue=${pvalue%%\"}
+echo "$pvalue"
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/11] perf/run: allow skipping some revisions
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (7 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 08/11] perf/aggregate: optionally include a t-test score Thomas Rast
@ 2012-03-12 15:10 ` Thomas Rast
  2012-03-12 15:28 ` [PATCH 00/11] perf improvements past v1.7.10 Nguyen Thai Ngoc Duy
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 15:10 UTC (permalink / raw
  To: git

You may want to compare two revisions/dirs A and B, but have already
tested one of them, say A.  './run A B' would re-run the tests for
both, which is very time-consuming.  You could invoke

  ./run B && ./aggregate.perl A B

but that is tedious.

Make it so that a ^ prefix means "aggregate this revision, but do not
test it"; i.e., you can say

  ./run ^A B

to achieve what you want.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/aggregate.perl |    1 +
 t/perf/run            |    5 +++++
 2 files changed, 6 insertions(+)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 4db685d..747f885 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -65,6 +65,7 @@ sub format_times {
 	my $arg = $ARGV[0];
 	my $dir;
 	last if -f $arg or $arg eq "--";
+	$arg =~ s/^\^// if (! -d $arg);
 	if (! -d $arg) {
 		my $rev = Git::command_oneline(qw(rev-parse --verify), $arg."^{commit}");
 		$dir = "build/".$rev;
diff --git a/t/perf/run b/t/perf/run
index e4f9c22..bc66067 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -37,6 +37,11 @@ build_git_rev () {
 
 run_dirs_helper () {
 	mydir=${1%/}
+	case "$mydir" in
+		^*)
+			return
+			;;
+	esac
 	shift
 	while test $# -gt 0 -a "$1" != -- -a ! -f "$1"; do
 		shift
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] perf improvements past v1.7.10
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (8 preceding siblings ...)
  2012-03-12 15:10 ` [PATCH 09/11] perf/run: allow skipping some revisions Thomas Rast
@ 2012-03-12 15:28 ` Nguyen Thai Ngoc Duy
  2012-03-12 16:35   ` Thomas Rast
  2012-03-12 16:09 ` Jakub Narebski
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-12 15:28 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

Not really related to the series, how do I compare the same tests on
different repositories? These repositories share the same DAG, but may
have different object representation (all loose, single pack, zillion
of packs, different versions of pack...)
-- 
Duy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] perf improvements past v1.7.10
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (9 preceding siblings ...)
  2012-03-12 15:28 ` [PATCH 00/11] perf improvements past v1.7.10 Nguyen Thai Ngoc Duy
@ 2012-03-12 16:09 ` Jakub Narebski
  2012-03-12 16:30   ` Thomas Rast
  2012-03-12 16:27 ` [PATCH 10/11] perf: implement a test-selection feature Thomas Rast
  2012-03-12 16:27 ` [PATCH 11/11] perf: add a bisection tool Thomas Rast
  12 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2012-03-12 16:09 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

Thomas Rast <trast@student.ethz.ch> writes:

> This is what I have collected over a week of playing with the perf
> suite.  There is no rush; I am just sending them out for anyone
> interested.  If you plan on using the perf suite, I would still
> suggest you apply them.
> 
> Aside from the new tests, the main goal is the bisection script.
> [8/11] is fun too.  Try it!  You will have to install GNU R however.
[...]

>   perf/aggregate: optionally include a t-test score

By the way, have you heard about Dumbbench module on CPAN?  It is a
module that attempts to implement reasonably robust benchmarking,
trying to find and discard bad runs (outliers):

  http://search.cpan.org/perldoc?Dumbbench

It might be an easier solution (thanks to wonders of local::lib) than
GNU R statistical package.

[...]
>   perf: implement a test-selection feature
>   perf: add a bisection tool

It looks like those last two (10/11, 11/11) didn't made it to git
mailing list, probably because of anti-spam size limits of email
message on vger.

Do you have a public repository with this branch somewhere?

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/11] perf: implement a test-selection feature
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (10 preceding siblings ...)
  2012-03-12 16:09 ` Jakub Narebski
@ 2012-03-12 16:27 ` Thomas Rast
  2012-03-12 16:27 ` [PATCH 11/11] perf: add a bisection tool Thomas Rast
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 16:27 UTC (permalink / raw
  To: git

When bisecting for a performance regression, there is usually no point
in running all tests in the pNNNN-foo.sh file, since you are
interested only in the result of a single test.

Make GIT_PERF_TEST_ONLY=pNNNN.I select only the test NNNN.I for
running.  All other test_perf blocks are skipped.  test_expect_success
blocks are deliberately not affected; presumably they consist of setup
code.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/perf-lib.sh |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index bfc2926..ddb3272 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -153,7 +153,11 @@ test_perf () {
 	test "$#" = 2 ||
 	error "bug in the test script: not 2 or 3 parameters to test-expect-success"
 	export test_prereq
-	if ! test_skip "$@"
+	skipping=
+	test_skip "$@" && skipping=t
+	test -z "$GIT_PERF_TEST_ONLY" ||
+	test "$GIT_PERF_TEST_ONLY" = $this_test.$test_count || skipping=t
+	if test -z "$skipping"
 	then
 		base=$(basename "$0" .sh)
 		echo "$test_count" >>"$perf_results_dir"/$base.subtests
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/11] perf: add a bisection tool
  2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
                   ` (11 preceding siblings ...)
  2012-03-12 16:27 ` [PATCH 10/11] perf: implement a test-selection feature Thomas Rast
@ 2012-03-12 16:27 ` Thomas Rast
  12 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 16:27 UTC (permalink / raw
  To: git

Using the perf suite for bisection is not hard with all the features
recently added; you can say

  git bisect start --no-checkout
  git bisect bad BAD
  git bisect good GOOD

  # repeat until done:
  ./run BISECT_HEAD ^BAD ^GOOD
  git bisect {good|bad}

This script does the bulk of the work for the user.  You can for
example say

  ./bisect_slowdown p4000.3 v1.7.0 v1.7.10-rc0

to bisect the ~5% performance hit incurred by 3e5a188 (diff.c: Ensure
"index $from..$to" line contains unambiguous SHA1s, 2010-05-30).  The
main output is quite noisy, but you can follow test-results/bisect.log
for a more concise summary of the results and decisions taken.

The heuristic applied is that if the result for BISECT_HEAD is scaled
between GOOD and BAD, then anything <0.25 must be good and >0.75 must
be bad.  Put in terms of differences towards the good and bad sides,
it decides in favor of the smaller one if they are at least a factor
of 3 apart.

Anything in the middle punts and asks for help.  You may then for
example re-run with more iterations to improve the measurements, or
manually edit the test script to make it more expensive (but remember
that you also need to re-test BAD and GOOD in this case!).  Once you
have told git-bisect your decision, you can restart the script to let
it continue the search.

Points of note:

* Performance regressions are not a binary event, so the search may
  not find anything reasonable.  For example, there is a ~6% hit in
  p3400.4 between adc3b2b27..a9f578685, but because it is spread
  across 13 commits, a bisection can only determine the general area.

* The default settings do not have high enough S/N in the timings to
  give good results.  Try setting GIT_PERF_REPEAT_COUNT to around 8.

* Many CPUs have a dynamic overclocking feature (such as Intel Turbo
  Boost) that helps single-process performance.  Running other things
  in parallel may prevent it from working.  The difference may be
  larger than 20% and will usually drown out the performance hit you
  are bisecting for.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/perf/bisect_slowdown |   88 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100755 t/perf/bisect_slowdown

diff --git a/t/perf/bisect_slowdown b/t/perf/bisect_slowdown
new file mode 100755
index 0000000..8b58b8d
--- /dev/null
+++ b/t/perf/bisect_slowdown
@@ -0,0 +1,88 @@
+#!/bin/sh
+
+test=$1
+good=$2
+bad=$3
+
+subtest=${test#p????.}
+maintest=${test%%.*}
+
+export subtest maintest
+
+export GIT_PERF_TEST_ONLY=$test
+
+goodrev=$(git rev-parse "$good^{commit}")
+badrev=$(git rev-parse "$bad^{commit}")
+
+git_dir=$(git rev-parse --git-dir)
+perfdir=$(cd "$(dirname "$0")" && pwd)
+log="$perfdir/test-results/bisect.log"
+
+export good bad goodrev badrev perfdir log
+
+if ! test -s "$git_dir/BISECT_LOG"; then
+	# Establish initial timings for the endpoints
+	echo "## Before bisection: comparing $good (good) and $bad (bad)" >"$log"
+	GIT_PERF_AGGREGATING_LATER=t ./run "$good" "$bad" -- "$maintest"*.sh
+	./aggregate.perl "$good" "$bad" -- "$maintest"*.sh | tee -a "$log"
+
+	cd "$perfdir/../.."
+
+	echo "## Starting bisection between $good (good) and $bad (bad)" >>"$log"
+	git bisect start --no-checkout
+	git bisect good "$good"
+	git bisect bad "$bad"
+else
+	cd "$perfdir/../.."
+fi
+
+git bisect run sh -c '
+	cd "$perfdir"
+	echo "## Testing $(git log --no-walk --format="%h %s" BISECT_HEAD)" >>"$log"
+	GIT_PERF_AGGREGATING_LATER=t ./run BISECT_HEAD -- "$maintest"*.sh
+	./aggregate.perl BISECT_HEAD "$good" "$bad" -- "$maintest"*.sh | tee -a "$log"
+	tested=$(git rev-parse BISECT_HEAD)
+	perl -e '\''
+		open STDOUT, "|-", "tee", "-a", "'"$log"'" or die "failed to spawn tee: $!";
+		sub avg {
+			open my $fp, "<$_" or die "cannot open $_: $!";
+			my $s = 0;
+			my $n = 0;
+			while (<$fp>) {
+				if (/^(\d+\.\d+)/) {
+					$s += $1;
+					$n++;
+				}
+			}
+			close $fp or die "cannot close: $!";
+			return $s/$n;
+		}
+		my ($good,$bad,$test) = map { avg $_ } @ARGV;
+		if ($good == $bad) { die "good = bad?" };
+		my $ratio = ($test-$good)/($bad-$good);
+		if ($ratio < 0.25) {
+			print "# Result: GOOD!\n";
+			exit 0;
+		} elsif ($ratio > 0.75) {
+			print "# Result: BAD!\n";
+			exit 42;
+		} else {
+			print "# INCONCLUSIVE.\n";
+			print "# Please check manually, say git bisect good/bad, and restart me!\n";
+			exit 1;
+		}
+	'\'' \
+		test-results/build_"$goodrev"."$maintest"*."$subtest".times \
+		test-results/build_"$badrev"."$maintest"*."$subtest".times \
+		test-results/build_"$tested"."$maintest"*."$subtest".times
+	ret=$?
+	test $ret = 0 && exit 0
+	test $ret = 42 && exit 1
+	exit 255
+' || exit $?
+
+cd "$perfdir"
+echo "## Double-checking $(git log --no-walk --format="%h %s" bisect/bad)" >>"$log"
+GIT_PERF_AGGREGATING_LATER=t ./run bisect/bad^ bisect/bad -- "$maintest"*.sh
+git show bisect/bad
+./aggregate.perl bisect/bad^ bisect/bad -- "$maintest"*.sh | tee -a "$log"
-- 
1.7.10.rc0.230.g16d90

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] perf improvements past v1.7.10
  2012-03-12 16:09 ` Jakub Narebski
@ 2012-03-12 16:30   ` Thomas Rast
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 16:30 UTC (permalink / raw
  To: Jakub Narebski; +Cc: Thomas Rast, git

Jakub Narebski <jnareb@gmail.com> writes:

> Thomas Rast <trast@student.ethz.ch> writes:
>
>>   perf/aggregate: optionally include a t-test score
>
> By the way, have you heard about Dumbbench module on CPAN?  It is a
> module that attempts to implement reasonably robust benchmarking,
> trying to find and discard bad runs (outliers):
>
>   http://search.cpan.org/perldoc?Dumbbench
>
> It might be an easier solution (thanks to wonders of local::lib) than
> GNU R statistical package.

Thanks for that pointer, I'll look into it.  I'm less worried about
which particular package to use for the t-test (after all it's a pretty
well-known method and could also be hacked in directly), but from a
quick glance at the TOC they seem to have put a lot of thought into the
post-processing.

>>   perf: implement a test-selection feature
>>   perf: add a bisection tool
>
> It looks like those last two (10/11, 11/11) didn't made it to git
> mailing list, probably because of anti-spam size limits of email
> message on vger.

Oops, my bad.  It's not the filter but a 0 too many in my send-email
globs.  I sent them out, but for your merging convenience it's also at

  git://github.com/trast/git.git t/perf-gather-post-1.7.10

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] perf improvements past v1.7.10
  2012-03-12 15:28 ` [PATCH 00/11] perf improvements past v1.7.10 Nguyen Thai Ngoc Duy
@ 2012-03-12 16:35   ` Thomas Rast
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 16:35 UTC (permalink / raw
  To: Nguyen Thai Ngoc Duy; +Cc: Thomas Rast, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Not really related to the series, how do I compare the same tests on
> different repositories? These repositories share the same DAG, but may
> have different object representation (all loose, single pack, zillion
> of packs, different versions of pack...)

Currently the only way is to run the test once for each setting of
GIT_PERF_REPO.  The only comparisons it does so far is between
revisions/build directories.

That's just because at the time, I saw neither a clear need nor a simple
syntax that I could use.  I'd welcome suggestions, however.

My only use-case that 'run' does not handle so far would just be
mirroring the table, i.e., getting a comparison of different tests (such
as 'rebase', 'rebase -m' etc.)  from the same build tree.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/11] Introduce a performance test for git-rebase
  2012-03-12 15:09 ` [PATCH 02/11] Introduce a performance test for git-rebase Thomas Rast
@ 2012-03-12 17:41   ` Thomas Rast
  2012-03-12 19:45     ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 17:41 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

Thomas Rast <trast@student.ethz.ch> writes:

> The perf framework lets the user run the test in an arbitrary repo,
> which makes writing a test for rebase a bit finicky.  This one uses a
> perl script to determine a longest linear segment of history, that is,
> a range A..B which is completely linear.  (For a full clone of
> git.git, this is the (whole) 'html' branch.)  The number of commits
> that are rebased is capped at 50, however.  That is, if A..B is more
> than 50 commits, it uses B~50..B instead.
>
> Having determined a suitable range, it then runs rebase with all
> combinations of rerere (on/off), -i / noninteractive, and -m / normal.
>
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---

I forgot to put the interesting part here:

 Test                                   this tree
 ------------------------------------------------------
 3400.4: rebase -f (rerere off)         4.64(2.12+1.80)
 3400.5: rebase -m -f (rerere off)      4.17(2.07+1.53)  to .4: -10% ***
 3400.6: rebase -i -f (rerere off)      4.51(2.30+1.57)  to .4:  -3% *
 3400.7: rebase -i -m -f (rerere off)   4.51(2.31+1.57)
 ------------------------------------------------------
 Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

I did the change/significance manually here, so I might have made a
mistake in the exact numbers.

However, it's pretty obvious that 'rebase -f' is slow compared to
'rebase -m'.  So slow in fact that the user would be better off running
an *interactive* rebase.

So what's the advantage of not using -m?  I always thought it was
because plain rebase was supposed to be *faster* than -m, and I
confirmed this in an extremely scientific test on #git-devel:

  [09 Mar 14:05] <charon> hrm, what's the advantage of straight rebase over 'rebase -m' ?
  [09 Mar 14:06] <mhagger> charon: It's supposed to be significantly faster
  [09 Mar 14:06] <charon> mhagger: ah thanks, that's what i suspected

This wasn't always the case; back in v1.7.0 you would see this picture:

 Test                                v1.7.0
 ---------------------------------------------------
 3400.4: rebase -f (rerere off)      3.75(2.12+1.18)
 3400.5: rebase -m -f (rerere off)   4.05(2.29+1.32)  to .4: +8% ***

This comes from the i18n support added in adc3b2b27..a9f578685, and from
43c2325 (am: use get_author_ident_from_commit instead of mailinfo when
rebasing, 2010-06-16).

Now I'm not really keen on hacking on git-am until it gets back its
performance, as for most uses it's probably still fast enough.  But I
think one important question is: can we get rid of the
git-format-patch|git-am mode of git-rebase, which has historically been
a source of pain (see, e.g., the aforementioned 43c2325)?

Ok, perhaps not: certain features like --whitespace depend on it :-(

--
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/11] Introduce a performance test for git-rebase
  2012-03-12 17:41   ` Thomas Rast
@ 2012-03-12 19:45     ` Junio C Hamano
  2012-03-12 20:20       ` Thomas Rast
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2012-03-12 19:45 UTC (permalink / raw
  To: Thomas Rast; +Cc: Thomas Rast, git

Thomas Rast <trast@inf.ethz.ch> writes:

> However, it's pretty obvious that 'rebase -f' is slow compared to
> 'rebase -m'.  So slow in fact that the user would be better off running
> an *interactive* rebase.

The mention of "-f" got me off track, but does "-m" avoid creating
new commits when it does not have to?  A side effect "-f" has is to
replay _all_ the commits, if I recall correctly.

> This comes from the i18n support added in adc3b2b27..a9f578685, and from
> 43c2325 (am: use get_author_ident_from_commit instead of mailinfo when
> rebasing, 2010-06-16).

> Now I'm not really keen on hacking on git-am until it gets back its
> performance, as for most uses it's probably still fast enough.  But I
> think one important question is: can we get rid of the
> git-format-patch|git-am mode of git-rebase, which has historically been
> a source of pain (see, e.g., the aforementioned 43c2325)?

In the longer term, the right way forward is to try doing less in
the patch mode in rebase.  For example, the command could make the
list of commits to be replayed exactly the same way as it does for
flattening cherry-pick rebase, and instead of running "am", run
"format-patch --stdout | apply --3way --whitespace=fix --index" and
commit the result using "commit -C".  That way it can depend on
"commit -C" and does not have to worry about stupid things like
43c2325.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/11] Introduce a performance test for git-rebase
  2012-03-12 19:45     ` Junio C Hamano
@ 2012-03-12 20:20       ` Thomas Rast
  2012-03-12 20:59         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Rast @ 2012-03-12 20:20 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <trast@inf.ethz.ch> writes:
>
>> However, it's pretty obvious that 'rebase -f' is slow compared to
>> 'rebase -m'.  So slow in fact that the user would be better off running
>> an *interactive* rebase.
>
> The mention of "-f" got me off track, but does "-m" avoid creating
> new commits when it does not have to?  A side effect "-f" has is to
> replay _all_ the commits, if I recall correctly.

Sorry, my bad.  *All* tests use -f for exactly that reason.

>> Now I'm not really keen on hacking on git-am until it gets back its
>> performance, as for most uses it's probably still fast enough.  But I
>> think one important question is: can we get rid of the
>> git-format-patch|git-am mode of git-rebase, which has historically been
>> a source of pain (see, e.g., the aforementioned 43c2325)?
>
> In the longer term, the right way forward is to try doing less in
> the patch mode in rebase.  For example, the command could make the
> list of commits to be replayed exactly the same way as it does for
> flattening cherry-pick rebase, and instead of running "am", run
> "format-patch --stdout | apply --3way --whitespace=fix --index" and
> commit the result using "commit -C".  That way it can depend on
> "commit -C" and does not have to worry about stupid things like
> 43c2325.

Ah, but that's a trap, or do you have an apply --3way series sitting
somewhere ;-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/11] Introduce a performance test for git-rebase
  2012-03-12 20:20       ` Thomas Rast
@ 2012-03-12 20:59         ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2012-03-12 20:59 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

Thomas Rast <trast@inf.ethz.ch> writes:

> Ah, but that's a trap, or do you have an apply --3way series sitting
> somewhere ;-)

According to rumors, it is more trivial than a typical summer
project for a GSoC student, so...

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-03-12 20:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-12 15:09 [PATCH 00/11] perf improvements past v1.7.10 Thomas Rast
2012-03-12 15:09 ` [PATCH 01/11] perf/aggregate: load Git.pm from the build tree Thomas Rast
2012-03-12 15:09 ` [PATCH 02/11] Introduce a performance test for git-rebase Thomas Rast
2012-03-12 17:41   ` Thomas Rast
2012-03-12 19:45     ` Junio C Hamano
2012-03-12 20:20       ` Thomas Rast
2012-03-12 20:59         ` Junio C Hamano
2012-03-12 15:09 ` [PATCH 03/11] Introduce a performance test for git-blame Thomas Rast
2012-03-12 15:10 ` [PATCH 04/11] perf: display average instead of minimum time Thomas Rast
2012-03-12 15:10 ` [PATCH 05/11] perf: suppress aggregation also in 'run' Thomas Rast
2012-03-12 15:10 ` [PATCH 06/11] perf: dereference to a commit when building Thomas Rast
2012-03-12 15:10 ` [PATCH 07/11] perf: convert realtime to seconds when collecting runs Thomas Rast
2012-03-12 15:10 ` [PATCH 08/11] perf/aggregate: optionally include a t-test score Thomas Rast
2012-03-12 15:10 ` [PATCH 09/11] perf/run: allow skipping some revisions Thomas Rast
2012-03-12 15:28 ` [PATCH 00/11] perf improvements past v1.7.10 Nguyen Thai Ngoc Duy
2012-03-12 16:35   ` Thomas Rast
2012-03-12 16:09 ` Jakub Narebski
2012-03-12 16:30   ` Thomas Rast
2012-03-12 16:27 ` [PATCH 10/11] perf: implement a test-selection feature Thomas Rast
2012-03-12 16:27 ` [PATCH 11/11] perf: add a bisection tool Thomas Rast

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).