git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 0/3] automatically skip away from broken commits
@ 2009-06-06  4:41 Christian Couder
  2009-06-06  4:41 ` [PATCH v3 1/3] bisect: add parameters to "filter_skipped" Christian Couder
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-06  4:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

This patch series makes "git bisect skip" skip away from an area where
the commits cannot be tested.

This is done automatically. The user has nothing todo. We alternatively
try to test commits 20%, 40% and 60% away from the optimal commit that
has been skipped.

Some comments have been added and a left over from v1 has been removed
since v2, thanks to Junio.

  bisect: add parameters to "filter_skipped"
  bisect: when skipping, choose a commit away from a skipped commit
  t6030: test skipping away from an already skipped commit

 bisect.c                    |   88 +++++++++++++++++++++++++++++++++++++++++--
 bisect.h                    |    4 +-
 builtin-rev-list.c          |    4 +-
 t/t6030-bisect-porcelain.sh |   12 ++++++
 4 files changed, 102 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/3] bisect: add parameters to "filter_skipped"
  2009-06-06  4:41 [PATCH v3 0/3] automatically skip away from broken commits Christian Couder
@ 2009-06-06  4:41 ` Christian Couder
  2009-06-06  4:41 ` [PATCH v3 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-06  4:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

because we will need to get more information from this function in
some later patches.

The new "int *count" parameter gives the number of commits left after
the skipped commit have been filtered out.

The new "int *skipped_first" parameter tells us if the first commit
in the list has been skipped. Note that using this parameter also
changes the behavior of the function if the first commit is indeed
skipped. Because we assume that in this case we will want all the
filtered commits, not just the first one, even if "show_all" is not
set.

So using a not NULL "skipped_first" parameter really means that we
plan to choose to test another commit than the first non skipped
one if the first commit in the list is skipped. That in turn means
that, in case the first commit is skipped, we have to return a
fully filtered list.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 bisect.c           |   40 ++++++++++++++++++++++++++++++++++++----
 bisect.h           |    4 +++-
 builtin-rev-list.c |    4 +++-
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/bisect.c b/bisect.c
index 2c14f4d..115cf5f 100644
--- a/bisect.c
+++ b/bisect.c
@@ -521,14 +521,34 @@ static char *join_sha1_array_hex(struct sha1_array *array, char delim)
 	return strbuf_detach(&joined_hexs, NULL);
 }
 
+/*
+ * In this function, passing a not NULL skipped_first is very special.
+ * It means that we want to know if the first commit in the list is
+ * skipped because we will want to test a commit away from it if it is
+ * indeed skipped.
+ * So if the first commit is skipped, we cannot take the shortcut to
+ * just "return list" when we find the first non skipped commit, we
+ * have to return a fully filtered list.
+ *
+ * We use (*skipped_first == -1) to mean "it has been found that the
+ * first commit is not skipped". In this case *skipped_first is set back
+ * to 0 just before the function returns.
+ */
 struct commit_list *filter_skipped(struct commit_list *list,
 				   struct commit_list **tried,
-				   int show_all)
+				   int show_all,
+				   int *count,
+				   int *skipped_first)
 {
 	struct commit_list *filtered = NULL, **f = &filtered;
 
 	*tried = NULL;
 
+	if (skipped_first)
+		*skipped_first = 0;
+	if (count)
+		*count = 0;
+
 	if (!skipped_revs.sha1_nr)
 		return list;
 
@@ -537,19 +557,31 @@ struct commit_list *filter_skipped(struct commit_list *list,
 		list->next = NULL;
 		if (0 <= lookup_sha1_array(&skipped_revs,
 					   list->item->object.sha1)) {
+			if (skipped_first && !*skipped_first)
+				*skipped_first = 1;
 			/* Move current to tried list */
 			*tried = list;
 			tried = &list->next;
 		} else {
-			if (!show_all)
-				return list;
+			if (!show_all) {
+				if (!skipped_first || !*skipped_first)
+					return list;
+			} else if (skipped_first && !*skipped_first) {
+				/* This means we know it's not skipped */
+				*skipped_first = -1;
+			}
 			/* Move current to filtered list */
 			*f = list;
 			f = &list->next;
+			if (count)
+				(*count)++;
 		}
 		list = next;
 	}
 
+	if (skipped_first && *skipped_first == -1)
+		*skipped_first = 0;
+
 	return filtered;
 }
 
@@ -865,7 +897,7 @@ int bisect_next_all(const char *prefix)
 
 	revs.commits = find_bisection(revs.commits, &reaches, &all,
 				       !!skipped_revs.sha1_nr);
-	revs.commits = filter_skipped(revs.commits, &tried, 0);
+	revs.commits = filter_skipped(revs.commits, &tried, 0, NULL, NULL);
 
 	if (!revs.commits) {
 		/*
diff --git a/bisect.h b/bisect.h
index fb744fd..82f8fc1 100644
--- a/bisect.h
+++ b/bisect.h
@@ -7,7 +7,9 @@ extern struct commit_list *find_bisection(struct commit_list *list,
 
 extern struct commit_list *filter_skipped(struct commit_list *list,
 					  struct commit_list **tried,
-					  int show_all);
+					  int show_all,
+					  int *count,
+					  int *skipped_first);
 
 extern void print_commit_list(struct commit_list *list,
 			      const char *format_cur,
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 73bff84..69753dc 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -262,7 +262,9 @@ int show_bisect_vars(struct rev_list_info *info, int reaches, int all)
 	if (!revs->commits && !(flags & BISECT_SHOW_TRIED))
 		return 1;
 
-	revs->commits = filter_skipped(revs->commits, &tried, flags & BISECT_SHOW_ALL);
+	revs->commits = filter_skipped(revs->commits, &tried,
+				       flags & BISECT_SHOW_ALL,
+				       NULL, NULL);
 
 	/*
 	 * revs->commits can reach "reaches" commits among
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/3] bisect: when skipping, choose a commit away from a skipped commit
  2009-06-06  4:41 [PATCH v3 0/3] automatically skip away from broken commits Christian Couder
  2009-06-06  4:41 ` [PATCH v3 1/3] bisect: add parameters to "filter_skipped" Christian Couder
@ 2009-06-06  4:41 ` Christian Couder
  2009-06-06  4:41 ` [PATCH v3 3/3] t6030: test skipping away from an already " Christian Couder
  2009-06-06 19:51 ` [PATCH v3 0/3] automatically skip away from broken commits Junio C Hamano
  3 siblings, 0 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-06  4:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

To do that a new function "apply_skip_ratio" is added and another
function "managed_skipped" is created to wrap both "filter_skipped"
and the previous one.

In "managed_skipped" we detect when we should choose a commit away
from a skipped one and then we automatically choose a skip ratio
to pass to "apply_skip_ratio".

The ratio is choosen so that it alternates between 1/5, 2/5 and
3/5.

In "apply_skip_ratio", we ignore a given ratio of all the commits
that could be tested.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 bisect.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/bisect.c b/bisect.c
index 115cf5f..6fdff05 100644
--- a/bisect.c
+++ b/bisect.c
@@ -585,6 +585,54 @@ struct commit_list *filter_skipped(struct commit_list *list,
 	return filtered;
 }
 
+static struct commit_list *apply_skip_ratio(struct commit_list *list,
+					    int count,
+					    int skip_num, int skip_denom)
+{
+	int index, i;
+	struct commit_list *cur, *previous;
+
+	cur = list;
+	previous = NULL;
+	index = count * skip_num / skip_denom;
+
+	for (i = 0; cur; cur = cur->next, i++) {
+		if (i == index) {
+			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
+				return cur;
+			if (previous)
+				return previous;
+			return list;
+		}
+		previous = cur;
+	}
+
+	return list;
+}
+
+static struct commit_list *managed_skipped(struct commit_list *list,
+					   struct commit_list **tried)
+{
+	int count, skipped_first;
+	int skip_num, skip_denom;
+
+	*tried = NULL;
+
+	if (!skipped_revs.sha1_nr)
+		return list;
+
+	list = filter_skipped(list, tried, 0, &count, &skipped_first);
+
+	if (!skipped_first)
+		return list;
+
+	/* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */
+	skip_num = count % 3 + 1;
+	skip_denom = 5;
+
+	return apply_skip_ratio(list, count, skip_num, skip_denom);
+}
+
 static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
@@ -897,7 +945,7 @@ int bisect_next_all(const char *prefix)
 
 	revs.commits = find_bisection(revs.commits, &reaches, &all,
 				       !!skipped_revs.sha1_nr);
-	revs.commits = filter_skipped(revs.commits, &tried, 0, NULL, NULL);
+	revs.commits = managed_skipped(revs.commits, &tried);
 
 	if (!revs.commits) {
 		/*
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/3] t6030: test skipping away from an already skipped commit
  2009-06-06  4:41 [PATCH v3 0/3] automatically skip away from broken commits Christian Couder
  2009-06-06  4:41 ` [PATCH v3 1/3] bisect: add parameters to "filter_skipped" Christian Couder
  2009-06-06  4:41 ` [PATCH v3 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
@ 2009-06-06  4:41 ` Christian Couder
  2009-06-06 19:51 ` [PATCH v3 0/3] automatically skip away from broken commits Junio C Hamano
  3 siblings, 0 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-06  4:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar


Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t6030-bisect-porcelain.sh |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 5254b23..4556cdd 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -555,6 +555,18 @@ test_expect_success 'restricting bisection on one dir and a file' '
 	grep "$PARA_HASH4 is first bad commit" my_bisect_log.txt
 '
 
+test_expect_success 'skipping away from skipped commit' '
+	git bisect start $PARA_HASH7 $HASH1 &&
+	para4=$(git rev-parse --verify HEAD) &&
+	test "$para4" = "$PARA_HASH4" &&
+        git bisect skip &&
+	hash7=$(git rev-parse --verify HEAD) &&
+	test "$hash7" = "$HASH7" &&
+        git bisect skip &&
+	hash3=$(git rev-parse --verify HEAD) &&
+	test "$hash3" = "$HASH3"
+'
+
 #
 #
 test_done
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-06  4:41 [PATCH v3 0/3] automatically skip away from broken commits Christian Couder
                   ` (2 preceding siblings ...)
  2009-06-06  4:41 ` [PATCH v3 3/3] t6030: test skipping away from an already " Christian Couder
@ 2009-06-06 19:51 ` Junio C Hamano
  2009-06-07  7:32   ` Christian Couder
  3 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2009-06-06 19:51 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Here is a fairly unscientific test to see how these three patches improve
things.  It involves two test scripts:

 - ./+runme is a "bisect run" script.  It says "tests OK" for all the
   commits given, except that it says "untestable" for all commits that
   are on the side branch of a merge that we first test.  I think this
   reflects what happens in the real world setting very well, in that a
   merge brings in a topic whose breakge is irrelevant to the bug we are
   hunting.

 - ./+runall is run in one git tree with two parameters (bad commit and
   good commit); it assumes that

   - ./+runme script above is in /var/tmp/+runme;

   - a version of git without these three patches is installed with
     prefix=/var/tmp/cc-original;

   - a version of git with these three patches is installed with
     prefix=/var/tmp/cc-updated;

   and runs the bisection using both versions of git.

The results are not that great; these three patches do not give clear win
as we hoped:

    $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
    Total 2681 commits...
    Marked 254 commits as untestable
    Took 14 rounds with cc-updated
    Marked 254 commits as untestable
    Took 13 rounds with cc-original
    $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.29
    Total 12917 commits...
    Marked 141 commits as untestable
    Took 15 rounds with cc-updated
    Marked 141 commits as untestable
    Took 15 rounds with cc-original
    $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc1 v2.6.29
    Total 10236 commits...
    Marked 7749 commits as untestable
    Took 15 rounds with cc-updated
    Marked 7749 commits as untestable
    Took 14 rounds with cc-original

I think this shows that the "skip ratio" heuristics based on the distance
in the "goodness scale" space does not help in avoiding commits that are
close in topological space.  There may be cases where the version with
patch gives fewer rounds especially when the history is very linear, but I
was mostly interested in the number of commits at least in the thousands,
which I think is what we should optimize things for, not a toy history of
linear 100 commits.

Here are the test scripts you can use to reproduce the results.

diff --git a/+runall b/+runall
new file mode 100755
index 0000000..291d000
--- /dev/null
+++ b/+runall
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+BAD=${1?BAD}
+GOOD=${2?GOOD}
+TOTAL=$(git rev-list $GOOD..$BAD | wc -l)
+
+echo Total $TOTAL commits...
+
+for git in cc-updated cc-original
+do
+	logfile=/var/tmp/log-$git-$BAD-$GOOD
+	(
+		PATH=/var/tmp/$git/bin:$PATH
+		export PATH
+		rm -f .git/UNTESTABLE
+		git bisect reset
+		git bisect start $BAD $GOOD
+		git bisect run /var/tmp/+runme
+		git bisect reset
+	) >$logfile 2>&1
+	grep "^Marked " $logfile
+	echo Took $(grep -c "Bisecting:" $logfile) rounds with $git
+done
diff --git a/+runme b/+runme
new file mode 100755
index 0000000..7b77338
--- /dev/null
+++ b/+runme
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Pretend that the first merged branch were untestable
+
+THIS=$(git rev-parse HEAD)
+
+if ! test -f .git/UNTESTABLE
+then
+	if MERGED=$(git rev-parse HEAD^2 2>/dev/null)
+	then
+		git rev-list HEAD^ ^$MERGED >.git/UNTESTABLE
+		echo Marked $(wc -l <.git/UNTESTABLE) commits as untestable
+		exit 125
+	else
+		exit 0
+	fi
+fi
+
+if grep "$THIS" .git/UNTESTABLE >/dev/null
+then
+	exit 125
+fi
+exit 0




   

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-06 19:51 ` [PATCH v3 0/3] automatically skip away from broken commits Junio C Hamano
@ 2009-06-07  7:32   ` Christian Couder
  2009-06-08  6:06     ` H. Peter Anvin
  2009-06-13  7:50     ` Christian Couder
  0 siblings, 2 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-07  7:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Le Saturday 06 June 2009, Junio C Hamano a écrit :
> Here is a fairly unscientific test to see how these three patches improve
> things.  It involves two test scripts:
>
>  - ./+runme is a "bisect run" script.  It says "tests OK" for all the
>    commits given, except that it says "untestable" for all commits that
>    are on the side branch of a merge that we first test.  I think this
>    reflects what happens in the real world setting very well, in that a
>    merge brings in a topic whose breakge is irrelevant to the bug we are
>    hunting.
>
>  - ./+runall is run in one git tree with two parameters (bad commit and
>    good commit); it assumes that
>
>    - ./+runme script above is in /var/tmp/+runme;
>
>    - a version of git without these three patches is installed with
>      prefix=/var/tmp/cc-original;
>
>    - a version of git with these three patches is installed with
>      prefix=/var/tmp/cc-updated;
>
>    and runs the bisection using both versions of git.
>
> The results are not that great; these three patches do not give clear win
> as we hoped:
>
>     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
>     Total 2681 commits...
>     Marked 254 commits as untestable
>     Took 14 rounds with cc-updated
>     Marked 254 commits as untestable
>     Took 13 rounds with cc-original
>     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.29
>     Total 12917 commits...
>     Marked 141 commits as untestable
>     Took 15 rounds with cc-updated
>     Marked 141 commits as untestable
>     Took 15 rounds with cc-original
>     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc1 v2.6.29
>     Total 10236 commits...
>     Marked 7749 commits as untestable
>     Took 15 rounds with cc-updated
>     Marked 7749 commits as untestable
>     Took 14 rounds with cc-original
>
> I think this shows that the "skip ratio" heuristics based on the distance
> in the "goodness scale" space does not help in avoiding commits that are
> close in topological space.  There may be cases where the version with
> patch gives fewer rounds especially when the history is very linear, but
> I was mostly interested in the number of commits at least in the
> thousands, which I think is what we should optimize things for, not a toy
> history of linear 100 commits.

I get the same results as yours, and I think that in these tests cases "git 
bisect" was not stuck with having only untestable commits with the 
highest "goodness" values. So in these cases the original behavior does 
quite well and that's why the updated behavior can't do better.

> Here are the test scripts you can use to reproduce the results.
>
> diff --git a/+runall b/+runall
> new file mode 100755
> index 0000000..291d000
> --- /dev/null
> +++ b/+runall
> @@ -0,0 +1,23 @@
> +#!/bin/sh
> +
> +BAD=${1?BAD}
> +GOOD=${2?GOOD}
> +TOTAL=$(git rev-list $GOOD..$BAD | wc -l)
> +
> +echo Total $TOTAL commits...
> +
> +for git in cc-updated cc-original
> +do
> +	logfile=/var/tmp/log-$git-$BAD-$GOOD
> +	(
> +		PATH=/var/tmp/$git/bin:$PATH
> +		export PATH
> +		rm -f .git/UNTESTABLE
> +		git bisect reset
> +		git bisect start $BAD $GOOD
> +		git bisect run /var/tmp/+runme
> +		git bisect reset
> +	) >$logfile 2>&1
> +	grep "^Marked " $logfile
> +	echo Took $(grep -c "Bisecting:" $logfile) rounds with $git
> +done
> diff --git a/+runme b/+runme
> new file mode 100755
> index 0000000..7b77338
> --- /dev/null
> +++ b/+runme
> @@ -0,0 +1,23 @@
> +#!/bin/sh
> +
> +# Pretend that the first merged branch were untestable
> +
> +THIS=$(git rev-parse HEAD)
> +
> +if ! test -f .git/UNTESTABLE
> +then
> +	if MERGED=$(git rev-parse HEAD^2 2>/dev/null)
> +	then
> +		git rev-list HEAD^ ^$MERGED >.git/UNTESTABLE
> +		echo Marked $(wc -l <.git/UNTESTABLE) commits as untestable
> +		exit 125
> +	else
> +		exit 0
> +	fi
> +fi
> +
> +if grep "$THIS" .git/UNTESTABLE >/dev/null
> +then
> +	exit 125
> +fi
> +exit 0

Thanks.

I renamed your +runme to +runme-good and created a +runme-bad script with 
the following changes:

$ diff -u +runme-good +runme-bad
--- +runme-good 2009-06-07 03:53:08.000000000 +0200
+++ +runme-bad  2009-06-07 08:11:33.000000000 +0200
@@ -12,7 +12,7 @@
                echo Marked $(wc -l <.git/UNTESTABLE) commits as untestable
                exit 125
        else
-               exit 0
+               exit 1
        fi
 fi

@@ -20,4 +20,4 @@
 then
        exit 125
 fi
-exit 0
+exit 1

So that the +runme-bad says "tests KO" (instead of "tests OK" for 
+runme-good) for all the commits given, except that it says "untestable" 
for all commits that are on the side branch of a merge that we first test.

Then I created the "+runme -> +runme-bad" symlink and tried your examples:

$ /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
Total 2681 commits...
Marked 254 commits as untestable
Took 11 rounds with cc-updated
Marked 254 commits as untestable
Took 12 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc8 v2.6.29
Total 12917 commits...
Marked 141 commits as untestable
Took 15 rounds with cc-updated
Marked 141 commits as untestable
Took 15 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc1 v2.6.29
Total 10236 commits...
Marked 1777 commits as untestable
Took 9 rounds with cc-updated
Marked 1777 commits as untestable
Took 492 rounds with cc-original

The last one is very interesting. It seems that the original implementation 
got stuck while the updated one did a great job...

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-07  7:32   ` Christian Couder
@ 2009-06-08  6:06     ` H. Peter Anvin
  2009-06-08  7:25       ` Junio C Hamano
  2009-06-13  7:50     ` Christian Couder
  1 sibling, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-08  6:06 UTC (permalink / raw)
  To: Christian Couder; +Cc: Junio C Hamano, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
>>
>> I think this shows that the "skip ratio" heuristics based on the distance
>> in the "goodness scale" space does not help in avoiding commits that are
>> close in topological space.  There may be cases where the version with
>> patch gives fewer rounds especially when the history is very linear, but
>> I was mostly interested in the number of commits at least in the
>> thousands, which I think is what we should optimize things for, not a toy
>> history of linear 100 commits.
> 
> I get the same results as yours, and I think that in these tests cases "git 
> bisect" was not stuck with having only untestable commits with the 
> highest "goodness" values. So in these cases the original behavior does 
> quite well and that's why the updated behavior can't do better.
> 

It's not entirely clear to me that this is any better than simply
randomly picking a commit from the list of plausible commits -- in other
words, eliminate the commits we can totally rule out, and then just pick
a random commit among the list of plausible commits.  This is not
*quite* as crazy as it sounds; it has the advantage of being an
extremely simple algorithm which shouldn't have any pathological behaviours.

The average information gain for a randomly picked commit is 1/(2 ln 2)
=~ 0.7213 bits, or an increase in the total bisect time by 39% over a
pure binary search.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-08  6:06     ` H. Peter Anvin
@ 2009-06-08  7:25       ` Junio C Hamano
  2009-06-08 15:51         ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2009-06-08  7:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Christian Couder, git, Sam Vilain, Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> It's not entirely clear to me that this is any better than simply
> randomly picking a commit from the list of plausible commits -- in other
> words, eliminate the commits we can totally rule out, and then just pick
> a random commit among the list of plausible commits.  This is not
> *quite* as crazy as it sounds; it has the advantage of being an
> extremely simple algorithm which shouldn't have any pathological behaviours.

That is essentially what Christian's patch does.  It does not try to go
away from untestable commits in topological space.  Instead, when we find
that the commit with the best "goodness" value is known to be untestable,
we step away from that commit by some alternating distance _in the
goodness value space_ (which does not have much to do with how commit
ancestry topology is laid out).  Viewed in the topology space, it is quite
similar to picking a different commit randomly, except for a very special
case where the remaining history is completely linear, in which case the
goodness value space and ancestry topology have a direct correlation.

That special case, and the deterministic hence repeatable nature of the
algorithm, are the two main advantages over picking a completely random
commit among the list of plausible commits.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-08  7:25       ` Junio C Hamano
@ 2009-06-08 15:51         ` H. Peter Anvin
  2009-06-08 21:02           ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-08 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, git, Sam Vilain, Ingo Molnar

Junio C Hamano wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
>> It's not entirely clear to me that this is any better than simply
>> randomly picking a commit from the list of plausible commits -- in other
>> words, eliminate the commits we can totally rule out, and then just pick
>> a random commit among the list of plausible commits.  This is not
>> *quite* as crazy as it sounds; it has the advantage of being an
>> extremely simple algorithm which shouldn't have any pathological behaviours.
> 
> That is essentially what Christian's patch does.  It does not try to go
> away from untestable commits in topological space.  Instead, when we find
> that the commit with the best "goodness" value is known to be untestable,
> we step away from that commit by some alternating distance _in the
> goodness value space_ (which does not have much to do with how commit
> ancestry topology is laid out).  Viewed in the topology space, it is quite
> similar to picking a different commit randomly, except for a very special
> case where the remaining history is completely linear, in which case the
> goodness value space and ancestry topology have a direct correlation.
> 
> That special case, and the deterministic hence repeatable nature of the
> algorithm, are the two main advantages over picking a completely random
> commit among the list of plausible commits.

Well, the cyclic "stepping distance" is pretty much a really lame PRNG
in this case.  In the linear case I think the distances are rather
arbitrary (and suboptimal), and I'm not sure it wouldn't simply be
better to actually use a PRNG (which can be unseeded and therefore
repeateable, or perhaps even better seeded with some combination of the
hash values involved.)

The advantage of that -- and I have to admit I don't know if it will
ever matter in practice -- is that using an actual PRNG:

a) is less likely to get into pathological capture behaviors.
b) doesn't make people think later that there is something magic to the
   arbitrary chosen numbers.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-08 15:51         ` H. Peter Anvin
@ 2009-06-08 21:02           ` Junio C Hamano
  2009-06-08 21:10             ` H. Peter Anvin
  2009-06-09  4:24             ` Christian Couder
  0 siblings, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2009-06-08 21:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Christian Couder, git, Sam Vilain, Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> The advantage of that -- and I have to admit I don't know if it will
> ever matter in practice -- is that using an actual PRNG:
>
> a) is less likely to get into pathological capture behaviors.
> b) doesn't make people think later that there is something magic to the
>    arbitrary chosen numbers.

My gut feeling agrees with you that both are likely to be true; these are
good points.

Christian, what do you think?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-08 21:02           ` Junio C Hamano
@ 2009-06-08 21:10             ` H. Peter Anvin
  2009-06-09  4:24             ` Christian Couder
  1 sibling, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-08 21:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, git, Sam Vilain, Ingo Molnar

Junio C Hamano wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
>> The advantage of that -- and I have to admit I don't know if it will
>> ever matter in practice -- is that using an actual PRNG:
>>
>> a) is less likely to get into pathological capture behaviors.
>> b) doesn't make people think later that there is something magic to the
>>    arbitrary chosen numbers.
> 
> My gut feeling agrees with you that both are likely to be true; these are
> good points.
> 
> Christian, what do you think?

Note: I do believe we should keep the obvious optimization of "if there
are no skip points in the plausible range, use the best point."  We
don't want a 39% increase in bisect time when not using skip.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-08 21:02           ` Junio C Hamano
  2009-06-08 21:10             ` H. Peter Anvin
@ 2009-06-09  4:24             ` Christian Couder
  2009-06-09 10:02               ` Jakub Narebski
  2009-06-09 12:26               ` Christian Couder
  1 sibling, 2 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-09  4:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: H. Peter Anvin, Christian Couder, git, Sam Vilain, Ingo Molnar

On Mon, Jun 8, 2009 at 11:02 PM, Junio C Hamano<gitster@pobox.com> wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
>
>> The advantage of that -- and I have to admit I don't know if it will
>> ever matter in practice -- is that using an actual PRNG:
>>
>> a) is less likely to get into pathological capture behaviors.
>> b) doesn't make people think later that there is something magic to the
>>    arbitrary chosen numbers.
>
> My gut feeling agrees with you that both are likely to be true; these are
> good points.
>
> Christian, what do you think?

Here are some reasons why I think my algorithm might be better:

- using HPA's formula I get on average 0.86 bits of information at
each step when alternating (against 0.72 when using a PRNG)
- I think that if the branches in the graph merge often between each
other, then on a big scale it's like when you are on the linear case
- I don't think we should try too hard to avoid pathological capture
behaviors, because I think we can't avoid them anyway in some cases,
like if the first bad commit is near many untestable commits

In the end I think that when you have too many long and completely
untestable branches for example, the right solution would be to have
something that lets you cut them off your graph and bisect on a much
cleaner graph, that's why I started working on "git replace" in the
first place.

I think we should not penalize people who have a quite clean graph
because some other people have a much dirtier one.

So I would be ok to implement a config option or a switch to "git
bisect start" to let people use a PRNG instead of my algorithm but I
think something like my algorithm should be the default.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09  4:24             ` Christian Couder
@ 2009-06-09 10:02               ` Jakub Narebski
  2009-06-09 15:11                 ` H. Peter Anvin
  2009-06-09 12:26               ` Christian Couder
  1 sibling, 1 reply; 37+ messages in thread
From: Jakub Narebski @ 2009-06-09 10:02 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, H. Peter Anvin, Christian Couder, git, Sam Vilain,
	Ingo Molnar

Christian Couder <christian.couder@gmail.com> writes:
> On Mon, Jun 8, 2009 at 11:02 PM, Junio C Hamano<gitster@pobox.com> wrote:
>> "H. Peter Anvin" <hpa@zytor.com> writes:
>>
>>> The advantage of that -- and I have to admit I don't know if it will
>>> ever matter in practice -- is that using an actual PRNG:
>>>
>>> a) is less likely to get into pathological capture behaviors.
>>> b) doesn't make people think later that there is something magic to the
>>>    arbitrary chosen numbers.
>>
>> My gut feeling agrees with you that both are likely to be true; these are
>> good points.
>>
>> Christian, what do you think?
> 
> Here are some reasons why I think my algorithm might be better:
> 
> - using HPA's formula I get on average 0.86 bits of information at
> each step when alternating (against 0.72 when using a PRNG)
> - I think that if the branches in the graph merge often between each
> other, then on a big scale it's like when you are on the linear case
> - I don't think we should try too hard to avoid pathological capture
> behaviors, because I think we can't avoid them anyway in some cases,
> like if the first bad commit is near many untestable commits


By the way, I have asked question about best algorithm for "bisect skip"
on StackOverflow[1], but didn't get (yet) any good responses...

[1]: http://stackoverflow.com/questions/959324/

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09  4:24             ` Christian Couder
  2009-06-09 10:02               ` Jakub Narebski
@ 2009-06-09 12:26               ` Christian Couder
  2009-06-09 15:25                 ` H. Peter Anvin
  1 sibling, 1 reply; 37+ messages in thread
From: Christian Couder @ 2009-06-09 12:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: H. Peter Anvin, Christian Couder, git, Sam Vilain, Ingo Molnar

On Tue, Jun 9, 2009 at 6:24 AM, Christian
Couder<christian.couder@gmail.com> wrote:
>
> So I would be ok to implement a config option or a switch to "git
> bisect start" to let people use a PRNG instead of my algorithm but I
> think something like my algorithm should be the default.

Another reason to have 2 algorithms is that when you use "git bisect
run" you might want to use the PRNG one because:

- you don't care much if the bisection use some more steps (as long as
it does not get stuck)
- you can't do much if it get stuck

On the other hand, when you bisect manually:

- you probably won't like it if you are asked to test some commits
that won't give a lot of information
- if it get stuck, you can manually use "git bisect visualize" and/or
"git bisect skip <range>" and/or some other manual commands to do
something about it

Regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 10:02               ` Jakub Narebski
@ 2009-06-09 15:11                 ` H. Peter Anvin
  2009-06-09 21:55                   ` Jakub Narebski
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-09 15:11 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Christian Couder, Junio C Hamano, Christian Couder, git,
	Sam Vilain, Ingo Molnar

Jakub Narebski wrote:
> 
> By the way, I have asked question about best algorithm for "bisect skip"
> on StackOverflow[1], but didn't get (yet) any good responses...
> 
> [1]: http://stackoverflow.com/questions/959324/
> 

I don't think there is a "best" algorithm, but I concur with the poster
that said broken commits tend to cluster.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 12:26               ` Christian Couder
@ 2009-06-09 15:25                 ` H. Peter Anvin
  2009-06-09 18:35                   ` Junio C Hamano
  2009-06-09 19:28                   ` Christian Couder
  0 siblings, 2 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-09 15:25 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
> On Tue, Jun 9, 2009 at 6:24 AM, Christian
> Couder<christian.couder@gmail.com> wrote:
>> So I would be ok to implement a config option or a switch to "git
>> bisect start" to let people use a PRNG instead of my algorithm but I
>> think something like my algorithm should be the default.
> 
> Another reason to have 2 algorithms is that when you use "git bisect
> run" you might want to use the PRNG one because:
> 
> - you don't care much if the bisection use some more steps (as long as
> it does not get stuck)
> - you can't do much if it get stuck
> 
> On the other hand, when you bisect manually:
> 
> - you probably won't like it if you are asked to test some commits
> that won't give a lot of information
> - if it get stuck, you can manually use "git bisect visualize" and/or
> "git bisect skip <range>" and/or some other manual commands to do
> something about it
> 

Sort-of-kind-of.  I doubt most users will be able to recover from a
stuck situation, and unless we have extremely high cost of testing
(which is true for some applications) then expecting the user to
optimizing manually is really bad user design.

My main objection to the "skip in goodness space" is exactly the same as
Junio's... it doesn't really buy you what it claims to sell.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 15:25                 ` H. Peter Anvin
@ 2009-06-09 18:35                   ` Junio C Hamano
  2009-06-09 18:42                     ` H. Peter Anvin
  2009-06-09 19:28                   ` Christian Couder
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2009-06-09 18:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Couder, Christian Couder, git, Sam Vilain, Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> My main objection to the "skip in goodness space" is exactly the same as
> Junio's... it doesn't really buy you what it claims to sell.

It is no worse than the original "pick the next best in goodness space";
neither try to avoid the ones close to untestable ones.

So as long as it does not claim "we intelligently try to skip away from
untestable ones", I am actually Ok with Christian's patch.  It might do
worse than the random walk in pathological cases, but I suspect not by a
big margin.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 18:35                   ` Junio C Hamano
@ 2009-06-09 18:42                     ` H. Peter Anvin
  0 siblings, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-09 18:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Christian Couder, git, Sam Vilain, Ingo Molnar

Junio C Hamano wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
>> My main objection to the "skip in goodness space" is exactly the same as
>> Junio's... it doesn't really buy you what it claims to sell.
> 
> It is no worse than the original "pick the next best in goodness space";
> neither try to avoid the ones close to untestable ones.
> 

Well, it's certainly better than "next best in goodness space", which is
provably pessimal in many common cases.

> So as long as it does not claim "we intelligently try to skip away from
> untestable ones", I am actually Ok with Christian's patch.  It might do
> worse than the random walk in pathological cases, but I suspect not by a
> big margin.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 15:25                 ` H. Peter Anvin
  2009-06-09 18:35                   ` Junio C Hamano
@ 2009-06-09 19:28                   ` Christian Couder
  2009-06-09 19:32                     ` H. Peter Anvin
  2009-06-09 20:37                     ` Junio C Hamano
  1 sibling, 2 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-09 19:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

On Tue, Jun 9, 2009 at 5:25 PM, H. Peter Anvin<hpa@zytor.com> wrote:
> Christian Couder wrote:
>> On Tue, Jun 9, 2009 at 6:24 AM, Christian
>> Couder<christian.couder@gmail.com> wrote:
>>> So I would be ok to implement a config option or a switch to "git
>>> bisect start" to let people use a PRNG instead of my algorithm but I
>>> think something like my algorithm should be the default.
>>
>> Another reason to have 2 algorithms is that when you use "git bisect
>> run" you might want to use the PRNG one because:
>>
>> - you don't care much if the bisection use some more steps (as long as
>> it does not get stuck)
>> - you can't do much if it get stuck
>>
>> On the other hand, when you bisect manually:
>>
>> - you probably won't like it if you are asked to test some commits
>> that won't give a lot of information
>> - if it get stuck, you can manually use "git bisect visualize" and/or
>> "git bisect skip <range>" and/or some other manual commands to do
>> something about it
>>
>
> Sort-of-kind-of.  I doubt most users will be able to recover from a
> stuck situation, and unless we have extremely high cost of testing
> (which is true for some applications) then expecting the user to
> optimizing manually is really bad user design.

My opinion is that we should not penalize all the people working on
"quite clean" projects and also people working on "not clean" projects
who are able to recover, on the pretence that there are other people
on these "not clean" projects who are not.

I think it's the projects maintainers' responsibility to keep their
projects graphs quite clean (and they have the right to ask git
developers for the tools to do that). If they don't do so, then their
users will suffer anyway. So it's not a big deal to ask them to teach
their users to add a "--prng" option to "git bisect start" for example
or something like that to try to work around the "not cleanliness" of
their graphs.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 19:28                   ` Christian Couder
@ 2009-06-09 19:32                     ` H. Peter Anvin
  2009-06-10  8:14                       ` Christian Couder
  2009-06-09 20:37                     ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-09 19:32 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
>>>
>> Sort-of-kind-of.  I doubt most users will be able to recover from a
>> stuck situation, and unless we have extremely high cost of testing
>> (which is true for some applications) then expecting the user to
>> optimizing manually is really bad user design.
> 
> My opinion is that we should not penalize all the people working on
> "quite clean" projects and also people working on "not clean" projects
> who are able to recover, on the pretence that there are other people
> on these "not clean" projects who are not.
> 
> I think it's the projects maintainers' responsibility to keep their
> projects graphs quite clean (and they have the right to ask git
> developers for the tools to do that).

No, it's not.  This is saying "it's the user's responsibility to make up
for shortcomings in the tools", which is completely bass-ackwards.

> If they don't do so, then their
> users will suffer anyway. So it's not a big deal to ask them to teach
> their users to add a "--prng" option to "git bisect start" for example
> or something like that to try to work around the "not cleanliness" of
> their graphs.

Let's see... we can penalize the default user by 19% (the amount of
difference) if and only if they have skip points (at which point your
"project manager's responsibility" has already failed) or we can risk an
inexperienced user getting stuck?

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 19:28                   ` Christian Couder
  2009-06-09 19:32                     ` H. Peter Anvin
@ 2009-06-09 20:37                     ` Junio C Hamano
  2009-06-10 19:37                       ` Christian Couder
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2009-06-09 20:37 UTC (permalink / raw)
  To: Christian Couder
  Cc: H. Peter Anvin, Christian Couder, git, Sam Vilain, Ingo Molnar

Christian Couder <christian.couder@gmail.com> writes:

> My opinion is that we should not penalize all the people working on
> "quite clean" projects and also people working on "not clean" projects
> who are able to recover, on the pretence that there are other people
> on these "not clean" projects who are not.
>
> I think it's the projects maintainers' responsibility to keep their
> projects graphs quite clean (and they have the right to ask git
> developers for the tools to do that).

You seem to be saying that a completely linear history is the only one
that is clean.

In an earlier message, I illustrated a case where two people independently
worked on unrelated topic and ended up merging their branches together, to
illustrate why your "not adjacent in goodness space" algorithm does not
give "away from untestable ones in topology space".

With your definition, that history is _not_ clean.  I do not think any
project wnats that kind of cleanness in their history.  You just made the
word "clean" to describe the history meaningless.

My take on the issue you mentioned above is that we should not penalize
the codepath's simplicity too much, only to cater to pathological
behaviour exhibited on an uninteresting special case of competely linear
history.

Your algorithm is Ok as an initial cut, in that it is an improvement over
the "next in goodness space, not even bother to avoid the pathological
case" algorithm.  But I do not think it is much better than HPA's "try the
best one if it is not skipped, otherwise pick one randomly", and if we
wanted to do better and to claim that we pick ones that are _away_ from
untestable ones, we do need to take topology into account.  That is all I
am saying.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 15:11                 ` H. Peter Anvin
@ 2009-06-09 21:55                   ` Jakub Narebski
  2009-06-09 22:54                     ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Narebski @ 2009-06-09 21:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Couder, Junio C Hamano, Christian Couder, git,
	Sam Vilain, Ingo Molnar

On Tue, 9 June 2009, H. Peter Anvin wrote:
> Jakub Narebski wrote:
> > 
> > By the way, I have asked question about best algorithm for "bisect skip"
> > on StackOverflow[1], but didn't get (yet) any good responses...
> > 
> > [1]: http://stackoverflow.com/questions/959324/
> > 
> 
> I don't think there is a "best" algorithm, but I concur with the poster
> that said broken commits tend to cluster.

Well, I guess that there might be, at least if we had some reasonable
assumption on probability distribution of bad commits.


Note: the idea sketched below is just handwaving currently...

Let us assume that we are currently at some untestable commit. Let us
also assume that we have some halfway reasonable model of probability
that a given commit is untestable, given it distance from known
untestable commit. "git rev-list --bisect-all" (or its inner equivalent)
would give us list of commits in the searched range, sorted in 
descending order by distance from edges (endpoints) of range:

  commit    "goodness" 
  --------------------
  c21d2e5*  (dist=60)
  94d6d14   (dist=59)
  ccb06f4   (dist=59)
  d1a1610   (dist=58)
  d4bf4b4   (dist=58)
  16c5646   (dist=57)

Let us assume that "c21d2e5" is untestable, and that we can easily 
calculate distance from it, substituting 0/0 (undef) if a commit
is not in straight line from "c21d2e5".

  commit    "goodness"  d
  -------------------------
  c21d2e5*  (dist=60)   0
  94d6d14   (dist=59)   1
  ccb06f4   (dist=59)   --
  d1a1610   (dist=58)   --
  d4bf4b4   (dist=58)   2
  16c5646   (dist=57)   3

Let us also assume that we have some model of probability that a commit
is untestable. In the example below numbers are ad hoc, and unrealistic.

  commit    "goodness"  d   P(untestable)
  ----------------------------------------
  c21d2e5*  (dist=60)   0   100%
  94d6d14   (dist=59)   1    75%
  ccb06f4   (dist=59)   --    0%
  d1a1610   (dist=58)   --    0%
  d4bf4b4   (dist=58)   2    33%
  16c5646   (dist=57)   3    25%

We can now calculate average number of bits of information would bring
(IIRC it was HPA and Christian that was writing about 'average information
gain' and 'bits of information at each step'; I don't quote know how it
is to be calculated)

  commit    "goodness"  d   P(untestable)  avg. gain
  ----------------------------------------
  c21d2e5*  (dist=60)   0   100%           0.0001
  94d6d14   (dist=59)   1    75%           0.45
  ccb06f4   (dist=59)   --    0%           0.98
  d1a1610   (dist=58)   --    0%           0.95
  d4bf4b4   (dist=58)   2    33%           0.65
  16c5646   (dist=57)   3    25%           0.66

Here 'avg. gain' numbers are totally handwaving... but the idea is to
pick up as next test point the commit with mist average information
gain.

What do you think of this algorithm (after of course it is made into
proper algorithm :-))?
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 21:55                   ` Jakub Narebski
@ 2009-06-09 22:54                     ` H. Peter Anvin
  0 siblings, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-09 22:54 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Christian Couder, Junio C Hamano, Christian Couder, git,
	Sam Vilain, Ingo Molnar

Jakub Narebski wrote:
> 
> Let us also assume that we have some model of probability that a commit
> is untestable. In the example below numbers are ad hoc, and unrealistic.
> 

What I mostly meant was that there simply is no such model that will be
ideal (since we simply don't have that information, almost by
definition), so therefore the overall algorithm can't be ideal, either.

However, Christian and you do make a very good point that instead of a
linear-probability random selection, it probably makes sense to bias the
randomness in favor of the commits that are more likely to provide
higher information gain.  This is effectively what Christian's patch
does in a somewhat clumsy way.

One of the nice things about combining a random algorithm with bias is
that the bias doesn't have to be perfect, it just have to be good
enough.  For example, we can take dramatic shortcuts like not taking
topology into accounts.

A logical bias function would indeed be an estimate of the information
gain.  One way we can calculate the effective information gain is by
take the list in "goodness order" that we already have, and treat it as
if it had originally been a linear history -- this will usually not be
the case, but we're probabilistically getting away with murder here.

The sorting in "goodness order" of a linear history means sorting
middlemost first, so the modified information density function with x
being the position in the list (x = 0 for best, x = 1 for worst) looks like:

- 1/(2 ln 2) * [ (1-x) ln (1-x) + (1+x) ln (1+x) - 2 ln 2 ]

I'd have to brush up some more of my calculus in order to remember how
to come up with a transformation function which would take a random
number and give us this exact probability distribution, but again, I
don't think it's hugely important; I suspect any function which gives us
a probability distribution that's even in the right neighborhood would
give us excellent results.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 19:32                     ` H. Peter Anvin
@ 2009-06-10  8:14                       ` Christian Couder
  0 siblings, 0 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-10  8:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

On Tue, Jun 9, 2009 at 9:32 PM, H. Peter Anvin<hpa@zytor.com> wrote:
> Christian Couder wrote:
>>>>
>>> Sort-of-kind-of.  I doubt most users will be able to recover from a
>>> stuck situation, and unless we have extremely high cost of testing
>>> (which is true for some applications) then expecting the user to
>>> optimizing manually is really bad user design.
>>
>> My opinion is that we should not penalize all the people working on
>> "quite clean" projects and also people working on "not clean" projects
>> who are able to recover, on the pretence that there are other people
>> on these "not clean" projects who are not.

By the way, if for example you have a project with 10% chance to land
in a "stuck area" of the graph, then with my algorithm the chance to
get stuck when you hit an untestable commit is 0.1^3 that is 0.1%. So
in this kind of projects the chance that the first bad commit is near
a lot untestable commits is much higher than that.

In fact I think that it's probably very hard to find a kind of project
where the chance to get stuck when using my algorithm is not dwarfed
by the chance of the first bad commit to be near or among many
untestable commits.

>> I think it's the projects maintainers' responsibility to keep their
>> projects graphs quite clean (and they have the right to ask git
>> developers for the tools to do that).
>
> No, it's not.  This is saying "it's the user's responsibility to make up
> for shortcomings in the tools", which is completely bass-ackwards.

I think that what I said is just the opposite that. If you find
shortcomings in the tools then you are welcome to ask us to fix them.

And if users find shortcomings in the cleanliness of the project
graph, they should be welcome to ask the maintainers to provide them
with ways to work around them, like for example a file that contains
many commits and range of commits that should always be skipped and
that can be used like for example:

$ git bisect start
$ git bisect skip $(cat always_skipped_file)
$ git bisect good ...
$ git bisect bad ...
...

Or the maintainers can in turn ask git developers what's going on with
"git replace" if they would prefer using that instead of such a
file...

>> If they don't do so, then their
>> users will suffer anyway. So it's not a big deal to ask them to teach
>> their users to add a "--prng" option to "git bisect start" for example
>> or something like that to try to work around the "not cleanliness" of
>> their graphs.
>
> Let's see... we can penalize the default user by 19% (the amount of
> difference) if and only if they have skip points (at which point your
> "project manager's responsibility" has already failed) or we can risk an
> inexperienced user getting stuck?

I am not sure I understand you very well but as I said above I think
with a "reasonably clean" graph the risk of the inexperienced user
getting stuck because of my algorithm is very low compared to the risk
of getting stuck because the first bad commit happens to be near many
untestable ones.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-09 20:37                     ` Junio C Hamano
@ 2009-06-10 19:37                       ` Christian Couder
  2009-06-10 21:17                         ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Christian Couder @ 2009-06-10 19:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: H. Peter Anvin, Christian Couder, git, Sam Vilain, Ingo Molnar

On Tue, Jun 9, 2009 at 10:37 PM, Junio C Hamano<gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> My opinion is that we should not penalize all the people working on
>> "quite clean" projects and also people working on "not clean" projects
>> who are able to recover, on the pretence that there are other people
>> on these "not clean" projects who are not.
>>
>> I think it's the projects maintainers' responsibility to keep their
>> projects graphs quite clean (and they have the right to ask git
>> developers for the tools to do that).
>
> You seem to be saying that a completely linear history is the only one
> that is clean.

I was thinking about projects that use test suites to make sure new
commits don't break a lot of things.
But I should have said "to keep their projects graph quite clean or to
provide specific information about how to work around potential
problems when bisecting."

> In an earlier message, I illustrated a case where two people independently
> worked on unrelated topic and ended up merging their branches together, to
> illustrate why your "not adjacent in goodness space" algorithm does not
> give "away from untestable ones in topology space".
>
> With your definition, that history is _not_ clean.  I do not think any
> project wnats that kind of cleanness in their history.  You just made the
> word "clean" to describe the history meaningless.

When I wrote "clean", I just mean with not too many untestable commits.

> My take on the issue you mentioned above is that we should not penalize
> the codepath's simplicity too much, only to cater to pathological
> behaviour exhibited on an uninteresting special case of competely linear
> history.

I tried to make a short yet efficient implementation, not to favor a
special case.

> Your algorithm is Ok as an initial cut, in that it is an improvement over
> the "next in goodness space, not even bother to avoid the pathological
> case" algorithm.  But I do not think it is much better than HPA's "try the
> best one if it is not skipped, otherwise pick one randomly", and if we
> wanted to do better and to claim that we pick ones that are _away_ from
> untestable ones, we do need to take topology into account.  That is all I
> am saying.

Ok. I started working on optionaly using a PRNG but I am not sure that
you will want to add another one.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-10 19:37                       ` Christian Couder
@ 2009-06-10 21:17                         ` Junio C Hamano
  2009-06-10 22:43                           ` H. Peter Anvin
  2009-06-11  4:02                           ` Christian Couder
  0 siblings, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2009-06-10 21:17 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, H. Peter Anvin, Christian Couder, git, Sam Vilain,
	Ingo Molnar

Christian Couder <christian.couder@gmail.com> writes:

> On Tue, Jun 9, 2009 at 10:37 PM, Junio C Hamano<gitster@pobox.com> wrote:
>> Christian Couder <christian.couder@gmail.com> writes:
>>
>>> My opinion is that we should not penalize all the people working on
>>> "quite clean" projects and also people working on "not clean" projects
>>> who are able to recover, on the pretence that there are other people
>>> on these "not clean" projects who are not.
> ...
> When I wrote "clean", I just mean with not too many untestable commits.

Ok, then the "opinion" in the above paragraph was simply stating the
obvious: we should have a good "bisect skip".  I obviously agree with that
;-).

In other words, you were not arguing against my observation that your
algorithm would not be much better than randomly picking the next commit
when the best one is untestable, unless the history is linear.  I guess
that was what I was confused with.  I thought you were saying that we
should give preferential treatment to people with linear history. 

> Ok. I started working on optionaly using a PRNG but I am not sure that
> you will want to add another one.

It may still make sense to replace, not add to, that "fixed alternating
distance in goodness space" with a randomized one, for the reasons HPA
stated, especially for avoiding to give a false impression that the magic
constants are picked for some reason.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-10 21:17                         ` Junio C Hamano
@ 2009-06-10 22:43                           ` H. Peter Anvin
  2009-06-11  4:02                           ` Christian Couder
  1 sibling, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-10 22:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Christian Couder, git, Sam Vilain, Ingo Molnar

Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
>> On Tue, Jun 9, 2009 at 10:37 PM, Junio C Hamano<gitster@pobox.com> wrote:
>>> Christian Couder <christian.couder@gmail.com> writes:
>>>
>>>> My opinion is that we should not penalize all the people working on
>>>> "quite clean" projects and also people working on "not clean" projects
>>>> who are able to recover, on the pretence that there are other people
>>>> on these "not clean" projects who are not.
>> ...
>> When I wrote "clean", I just mean with not too many untestable commits.
> 
> Ok, then the "opinion" in the above paragraph was simply stating the
> obvious: we should have a good "bisect skip".  I obviously agree with that
> ;-).
> 
> In other words, you were not arguing against my observation that your
> algorithm would not be much better than randomly picking the next commit
> when the best one is untestable, unless the history is linear.  I guess
> that was what I was confused with.  I thought you were saying that we
> should give preferential treatment to people with linear history. 
> 
>> Ok. I started working on optionaly using a PRNG but I am not sure that
>> you will want to add another one.
> 
> It may still make sense to replace, not add to, that "fixed alternating
> distance in goodness space" with a randomized one, for the reasons HPA
> stated, especially for avoiding to give a false impression that the magic
> constants are picked for some reason.
> 

That being said, Christian's observation that a biased selection would
be better than a linear random pick is a good one.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-10 21:17                         ` Junio C Hamano
  2009-06-10 22:43                           ` H. Peter Anvin
@ 2009-06-11  4:02                           ` Christian Couder
  2009-06-11  4:43                             ` H. Peter Anvin
  1 sibling, 1 reply; 37+ messages in thread
From: Christian Couder @ 2009-06-11  4:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, H. Peter Anvin, git, Sam Vilain, Ingo Molnar

Le Wednesday 10 June 2009, Junio C Hamano a écrit :
> Christian Couder <christian.couder@gmail.com> writes:
> > On Tue, Jun 9, 2009 at 10:37 PM, Junio C Hamano<gitster@pobox.com> 
wrote:
> >> Christian Couder <christian.couder@gmail.com> writes:
> >>> My opinion is that we should not penalize all the people working on
> >>> "quite clean" projects and also people working on "not clean"
> >>> projects who are able to recover, on the pretence that there are
> >>> other people on these "not clean" projects who are not.
> >
> > ...
> > When I wrote "clean", I just mean with not too many untestable commits.
>
> Ok, then the "opinion" in the above paragraph was simply stating the
> obvious: we should have a good "bisect skip".  I obviously agree with
> that ;-).
>
> In other words, you were not arguing against my observation that your
> algorithm would not be much better than randomly picking the next commit
> when the best one is untestable, unless the history is linear.

I think my algorithm is better enough than a random one to be worth using by 
default. Like HPA says it's in practice like a random one with a bias.

That's because the "goodness" value is something that has a relationship 
with the graph topology. The "goodness" value is some kind of distance from 
either the good or the bad commits. The farther from the good and bad 
commits the higher is the "goodness" value. And my algorithm tries to avoid 
commits with low "goodness" value because they should be those near the 
good and bad commits and we know that those near the good and bad commits 
wont give a lot of information.

> I guess 
> that was what I was confused with.  I thought you were saying that we
> should give preferential treatment to people with linear history.
>
> > Ok. I started working on optionaly using a PRNG but I am not sure that
> > you will want to add another one.
>
> It may still make sense to replace, not add to, that "fixed alternating
> distance in goodness space" with a randomized one, for the reasons HPA
> stated, especially for avoiding to give a false impression that the magic
> constants are picked for some reason.

But there _is_ a reason.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-11  4:02                           ` Christian Couder
@ 2009-06-11  4:43                             ` H. Peter Anvin
  2009-06-11  5:05                               ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-11  4:43 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
> 
> I think my algorithm is better enough than a random one to be worth using by 
> default. Like HPA says it's in practice like a random one with a bias.
> 

But it's a really poor such.  Using an actual biased PRNG will give the
best of both variants (a bias to prefer better commits, and still all
commits reachable.)

After playing around with Wolfram Alpha for way too long, it seems that
a suitable bias function would be to take the fourth root of the PRNG
value, i.e. raise it to the power 0.25.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-11  4:43                             ` H. Peter Anvin
@ 2009-06-11  5:05                               ` H. Peter Anvin
  2009-06-12 11:56                                 ` Christian Couder
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-11  5:05 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Christian Couder, git, Sam Vilain, Ingo Molnar

H. Peter Anvin wrote:
> 
> After playing around with Wolfram Alpha for way too long, it seems that
> a suitable bias function would be to take the fourth root of the PRNG
> value, i.e. raise it to the power 0.25.
> 

Urk, I managed to get myself completely confused -- I did the series
approximation on the wrong side of inverting the function.  The correct
power is actually 1.5 (over the range 0 to 1), a value > 1 is necessary
to bias the PRNG toward the beginning (x = 0) of the list.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-11  5:05                               ` H. Peter Anvin
@ 2009-06-12 11:56                                 ` Christian Couder
  2009-06-13 19:03                                   ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: Christian Couder @ 2009-06-12 11:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Couder, Junio C Hamano, git, Sam Vilain, Ingo Molnar

On Thu, Jun 11, 2009 at 7:05 AM, H. Peter Anvin<hpa@zytor.com> wrote:
>
> Urk, I managed to get myself completely confused -- I did the series
> approximation on the wrong side of inverting the function.  The correct
> power is actually 1.5 (over the range 0 to 1), a value > 1 is necessary
> to bias the PRNG toward the beginning (x = 0) of the list.

I started working on this, but I wonder if it's better to add a
#include <math.h> and link with -lm than to provide a custom sqrt
implementation. Too bad the best power is not 2.

To implement the PRNG, I guess that using something based on the
function given by "man 3 rand" should be ok:

int get_prn(int count) {
    count = count * 1103515245 + 12345;
    return((unsigned)(count/65536) % 32768);
}

where the "count" we pass is the count of elements in the list rather
than the static seed.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-07  7:32   ` Christian Couder
  2009-06-08  6:06     ` H. Peter Anvin
@ 2009-06-13  7:50     ` Christian Couder
  1 sibling, 0 replies; 37+ messages in thread
From: Christian Couder @ 2009-06-13  7:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Le Sunday 07 June 2009, Christian Couder a écrit :
> Le Saturday 06 June 2009, Junio C Hamano a écrit :
> > Here is a fairly unscientific test to see how these three patches
> > improve things.  It involves two test scripts:
> >
> >  - ./+runme is a "bisect run" script.  It says "tests OK" for all the
> >    commits given, except that it says "untestable" for all commits that
> >    are on the side branch of a merge that we first test.  I think this
> >    reflects what happens in the real world setting very well, in that a
> >    merge brings in a topic whose breakge is irrelevant to the bug we
> > are hunting.
> >
> >  - ./+runall is run in one git tree with two parameters (bad commit and
> >    good commit); it assumes that
> >
> >    - ./+runme script above is in /var/tmp/+runme;
> >
> >    - a version of git without these three patches is installed with
> >      prefix=/var/tmp/cc-original;
> >
> >    - a version of git with these three patches is installed with
> >      prefix=/var/tmp/cc-updated;
> >
> >    and runs the bisection using both versions of git.
> >
> > The results are not that great; these three patches do not give clear
> > win as we hoped:
> >
> >     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
> >     Total 2681 commits...
> >     Marked 254 commits as untestable
> >     Took 14 rounds with cc-updated
> >     Marked 254 commits as untestable
> >     Took 13 rounds with cc-original
> >     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc8 v2.6.29
> >     Total 12917 commits...
> >     Marked 141 commits as untestable
> >     Took 15 rounds with cc-updated
> >     Marked 141 commits as untestable
> >     Took 15 rounds with cc-original
> >     $ linux-2.6/master; /var/tmp/+runall v2.6.30-rc1 v2.6.29
> >     Total 10236 commits...
> >     Marked 7749 commits as untestable
> >     Took 15 rounds with cc-updated
> >     Marked 7749 commits as untestable
> >     Took 14 rounds with cc-original
> >
> > I think this shows that the "skip ratio" heuristics based on the
> > distance in the "goodness scale" space does not help in avoiding
> > commits that are close in topological space.  There may be cases where
> > the version with patch gives fewer rounds especially when the history
> > is very linear, but I was mostly interested in the number of commits at
> > least in the thousands, which I think is what we should optimize things
> > for, not a toy history of linear 100 commits.
>
> I get the same results as yours, and I think that in these tests cases
> "git bisect" was not stuck with having only untestable commits with the
> highest "goodness" values. So in these cases the original behavior does
> quite well and that's why the updated behavior can't do better.
>
> > Here are the test scripts you can use to reproduce the results.
> >
> > diff --git a/+runall b/+runall
> > new file mode 100755
> > index 0000000..291d000
> > --- /dev/null
> > +++ b/+runall
> > @@ -0,0 +1,23 @@
> > +#!/bin/sh
> > +
> > +BAD=${1?BAD}
> > +GOOD=${2?GOOD}
> > +TOTAL=$(git rev-list $GOOD..$BAD | wc -l)
> > +
> > +echo Total $TOTAL commits...
> > +
> > +for git in cc-updated cc-original
> > +do
> > +	logfile=/var/tmp/log-$git-$BAD-$GOOD
> > +	(
> > +		PATH=/var/tmp/$git/bin:$PATH
> > +		export PATH
> > +		rm -f .git/UNTESTABLE
> > +		git bisect reset
> > +		git bisect start $BAD $GOOD
> > +		git bisect run /var/tmp/+runme
> > +		git bisect reset
> > +	) >$logfile 2>&1
> > +	grep "^Marked " $logfile
> > +	echo Took $(grep -c "Bisecting:" $logfile) rounds with $git
> > +done
> > diff --git a/+runme b/+runme
> > new file mode 100755
> > index 0000000..7b77338
> > --- /dev/null
> > +++ b/+runme
> > @@ -0,0 +1,23 @@
> > +#!/bin/sh
> > +
> > +# Pretend that the first merged branch were untestable
> > +
> > +THIS=$(git rev-parse HEAD)
> > +
> > +if ! test -f .git/UNTESTABLE
> > +then
> > +	if MERGED=$(git rev-parse HEAD^2 2>/dev/null)
> > +	then
> > +		git rev-list HEAD^ ^$MERGED >.git/UNTESTABLE
> > +		echo Marked $(wc -l <.git/UNTESTABLE) commits as untestable
> > +		exit 125
> > +	else
> > +		exit 0
> > +	fi
> > +fi
> > +
> > +if grep "$THIS" .git/UNTESTABLE >/dev/null
> > +then
> > +	exit 125
> > +fi
> > +exit 0
>
> Thanks.
>
> I renamed your +runme to +runme-good and created a +runme-bad script with
> the following changes:
>
> $ diff -u +runme-good +runme-bad
> --- +runme-good 2009-06-07 03:53:08.000000000 +0200
> +++ +runme-bad  2009-06-07 08:11:33.000000000 +0200
> @@ -12,7 +12,7 @@
>                 echo Marked $(wc -l <.git/UNTESTABLE) commits as
> untestable exit 125
>         else
> -               exit 0
> +               exit 1
>         fi
>  fi
>
> @@ -20,4 +20,4 @@
>  then
>         exit 125
>  fi
> -exit 0
> +exit 1
>
> So that the +runme-bad says "tests KO" (instead of "tests OK" for
> +runme-good) for all the commits given, except that it says "untestable"
> for all commits that are on the side branch of a merge that we first
> test.
>
> Then I created the "+runme -> +runme-bad" symlink and tried your
> examples:
>
> $ /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
> Total 2681 commits...
> Marked 254 commits as untestable
> Took 11 rounds with cc-updated
> Marked 254 commits as untestable
> Took 12 rounds with cc-original
>
> $ /var/tmp/+runall v2.6.30-rc8 v2.6.29
> Total 12917 commits...
> Marked 141 commits as untestable
> Took 15 rounds with cc-updated
> Marked 141 commits as untestable
> Took 15 rounds with cc-original
>
> $ /var/tmp/+runall v2.6.30-rc1 v2.6.29
> Total 10236 commits...
> Marked 1777 commits as untestable
> Took 9 rounds with cc-updated
> Marked 1777 commits as untestable
> Took 492 rounds with cc-original
>
> The last one is very interesting. It seems that the original
> implementation got stuck while the updated one did a great job...

With the patch I just posted that use a PRNG with a bias, I get the 
following results.

With the "+runme -> +runme-good" symlink:

$ /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
Total 2681 commits...
Marked 254 commits as untestable
Took 13 rounds with cc-updated
Marked 254 commits as untestable
Took 13 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc8 v2.6.29
Total 12917 commits...
Marked 141 commits as untestable
Took 15 rounds with cc-updated
Marked 141 commits as untestable
Took 15 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc1 v2.6.29
Total 10236 commits...
Marked 7749 commits as untestable
Took 14 rounds with cc-updated
Marked 7749 commits as untestable
Took 14 rounds with cc-original

And with the "+runme -> +runme-bad" symlink:

$ /var/tmp/+runall v2.6.30-rc8 v2.6.30-rc1
Total 2681 commits...
Marked 254 commits as untestable
Took 13 rounds with cc-updated
Marked 254 commits as untestable
Took 12 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc8 v2.6.29
Total 12917 commits...
Marked 141 commits as untestable
Took 15 rounds with cc-updated
Marked 141 commits as untestable
Took 15 rounds with cc-original

$ /var/tmp/+runall v2.6.30-rc1 v2.6.29
Total 10236 commits...
Marked 1777 commits as untestable
Took 12 rounds with cc-updated
Marked 1777 commits as untestable
Took 492 rounds with cc-original

So there the results are quite similar to my alternating between 3 ratios 
algorithm.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-12 11:56                                 ` Christian Couder
@ 2009-06-13 19:03                                   ` H. Peter Anvin
  2009-06-13 19:35                                     ` Jakub Narebski
  2009-06-15  7:59                                     ` Christian Couder
  0 siblings, 2 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-13 19:03 UTC (permalink / raw)
  To: Christian Couder
  Cc: Christian Couder, Junio C Hamano, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
> On Thu, Jun 11, 2009 at 7:05 AM, H. Peter Anvin<hpa@zytor.com> wrote:
>> Urk, I managed to get myself completely confused -- I did the series
>> approximation on the wrong side of inverting the function.  The correct
>> power is actually 1.5 (over the range 0 to 1), a value > 1 is necessary
>> to bias the PRNG toward the beginning (x = 0) of the list.
> 
> I started working on this, but I wonder if it's better to add a
> #include <math.h> and link with -lm than to provide a custom sqrt
> implementation. Too bad the best power is not 2.
> 

That's what I would do.  It's not like sqrt() is a strange, unportable
function.

> To implement the PRNG, I guess that using something based on the
> function given by "man 3 rand" should be ok:
> 
> int get_prn(int count) {
>     count = count * 1103515245 + 12345;
>     return((unsigned)(count/65536) % 32768);
> }
> 
> where the "count" we pass is the count of elements in the list rather
> than the static seed.

Yes, or perhaps better we could use some combination of the SHA-1s
involved as seeds... they are rather nice for this as they are wide and
much better PRNGs than most classical algorithms.

The main problem with the above algorithm is that it only produces 16
bits of output, which when biased can turn into a fairly significant
granularity.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-13 19:03                                   ` H. Peter Anvin
@ 2009-06-13 19:35                                     ` Jakub Narebski
  2009-06-13 19:57                                       ` H. Peter Anvin
  2009-06-15  7:59                                     ` Christian Couder
  1 sibling, 1 reply; 37+ messages in thread
From: Jakub Narebski @ 2009-06-13 19:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Couder, Christian Couder, Junio C Hamano, git,
	Sam Vilain, Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:
> Christian Couder wrote:
>> On Thu, Jun 11, 2009 at 7:05 AM, H. Peter Anvin<hpa@zytor.com> wrote:

>> To implement the PRNG, I guess that using something based on the
>> function given by "man 3 rand" should be ok:
>> 
>> int get_prn(int count) {
>>     count = count * 1103515245 + 12345;
>>     return((unsigned)(count/65536) % 32768);
>> }
>> 
>> where the "count" we pass is the count of elements in the list rather
>> than the static seed.
> 
> Yes, or perhaps better we could use some combination of the SHA-1s
> involved as seeds... they are rather nice for this as they are wide and
> much better PRNGs than most classical algorithms.
> 
> The main problem with the above algorithm is that it only produces 16
> bits of output, which when biased can turn into a fairly significant
> granularity.

Why not borrow one of algorithms, e.g. taus[1] from GSL (GNU
Scientific Library)?

If I understand "Random Number Generator Performance" chapter in GSL
Manual it is of comparable performance of the above BSD `rand`
generator, and is of simulation quality.

[1] maximally equidistributed combined Tausworthe generator by L'Ecuyer

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-13 19:35                                     ` Jakub Narebski
@ 2009-06-13 19:57                                       ` H. Peter Anvin
  0 siblings, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-13 19:57 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Christian Couder, Christian Couder, Junio C Hamano, git,
	Sam Vilain, Ingo Molnar

Jakub Narebski wrote:
> 
> Why not borrow one of algorithms, e.g. taus[1] from GSL (GNU
> Scientific Library)?
> 
> If I understand "Random Number Generator Performance" chapter in GSL
> Manual it is of comparable performance of the above BSD `rand`
> generator, and is of simulation quality.
> 
> [1] maximally equidistributed combined Tausworthe generator by L'Ecuyer
> 

Interesting paper (the algorithm is on page 8):

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=F2BD39FF5C24221B424E12D9ED4E68B7?doi=10.1.1.43.4155&rep=rep1&type=pdf

The reason I suggested SHA-1 is because we already have SHA-1 code
involved (in fact, we even have a bunch of precalculated SHA-1s
available to us) and performance matters not one iota in this application.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-13 19:03                                   ` H. Peter Anvin
  2009-06-13 19:35                                     ` Jakub Narebski
@ 2009-06-15  7:59                                     ` Christian Couder
  2009-06-15 13:16                                       ` H. Peter Anvin
  1 sibling, 1 reply; 37+ messages in thread
From: Christian Couder @ 2009-06-15  7:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Couder, Junio C Hamano, git, Sam Vilain, Ingo Molnar

On Sat, Jun 13, 2009 at 9:03 PM, H. Peter Anvin<hpa@zytor.com> wrote:
> Christian Couder wrote:
>> On Thu, Jun 11, 2009 at 7:05 AM, H. Peter Anvin<hpa@zytor.com> wrote:
>>> Urk, I managed to get myself completely confused -- I did the series
>>> approximation on the wrong side of inverting the function.  The correct
>>> power is actually 1.5 (over the range 0 to 1), a value > 1 is necessary
>>> to bias the PRNG toward the beginning (x = 0) of the list.
>>
>> I started working on this, but I wonder if it's better to add a
>> #include <math.h> and link with -lm than to provide a custom sqrt
>> implementation. Too bad the best power is not 2.
>>
>
> That's what I would do.  It's not like sqrt() is a strange, unportable
> function.

Yes, but I feared that there could be rounding related differences
between platforms.

By the way, could you explain why power 1.5 is better than 2? It would
be much simpler if we could avoid square rooting anything in the first
place.

>> To implement the PRNG, I guess that using something based on the
>> function given by "man 3 rand" should be ok:
>>
>> int get_prn(int count) {
>>     count = count * 1103515245 + 12345;
>>     return((unsigned)(count/65536) % 32768);
>> }
>>
>> where the "count" we pass is the count of elements in the list rather
>> than the static seed.
>
> Yes, or perhaps better we could use some combination of the SHA-1s
> involved as seeds... they are rather nice for this as they are wide and
> much better PRNGs than most classical algorithms.
>
> The main problem with the above algorithm is that it only produces 16
> bits of output, which when biased can turn into a fairly significant
> granularity.

I don't think this is a real problem for this application. In fact I
think it's already quite overkill and there are better things to do,
like looking for a commit on a different branch among the first ones
in the list, if we want to improve the current behavior.

So unless there is a real flaw, I am going to work on something else.

Best regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/3] automatically skip away from broken commits
  2009-06-15  7:59                                     ` Christian Couder
@ 2009-06-15 13:16                                       ` H. Peter Anvin
  0 siblings, 0 replies; 37+ messages in thread
From: H. Peter Anvin @ 2009-06-15 13:16 UTC (permalink / raw)
  To: Christian Couder
  Cc: Christian Couder, Junio C Hamano, git, Sam Vilain, Ingo Molnar

Christian Couder wrote:
> Yes, but I feared that there could be rounding related differences
> between platforms.

Doesn't matter for this application.  Really.  Anything you're going to
do yourself will be much worse.  Realistically, you might have a
difference in the 53rd bit.

> By the way, could you explain why power 1.5 is better than 2? It would
> be much simpler if we could avoid square rooting anything in the first
> place.

It is the power function that most closely creates a distribution which
matches the value curve (rounded to a small fraction.)

> I don't think this is a real problem for this application. In fact I
> think it's already quite overkill and there are better things to do,
> like looking for a commit on a different branch among the first ones
> in the list, if we want to improve the current behavior.
> 
> So unless there is a real flaw, I am going to work on something else.

Makes sense.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2009-06-15 13:17 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-06  4:41 [PATCH v3 0/3] automatically skip away from broken commits Christian Couder
2009-06-06  4:41 ` [PATCH v3 1/3] bisect: add parameters to "filter_skipped" Christian Couder
2009-06-06  4:41 ` [PATCH v3 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
2009-06-06  4:41 ` [PATCH v3 3/3] t6030: test skipping away from an already " Christian Couder
2009-06-06 19:51 ` [PATCH v3 0/3] automatically skip away from broken commits Junio C Hamano
2009-06-07  7:32   ` Christian Couder
2009-06-08  6:06     ` H. Peter Anvin
2009-06-08  7:25       ` Junio C Hamano
2009-06-08 15:51         ` H. Peter Anvin
2009-06-08 21:02           ` Junio C Hamano
2009-06-08 21:10             ` H. Peter Anvin
2009-06-09  4:24             ` Christian Couder
2009-06-09 10:02               ` Jakub Narebski
2009-06-09 15:11                 ` H. Peter Anvin
2009-06-09 21:55                   ` Jakub Narebski
2009-06-09 22:54                     ` H. Peter Anvin
2009-06-09 12:26               ` Christian Couder
2009-06-09 15:25                 ` H. Peter Anvin
2009-06-09 18:35                   ` Junio C Hamano
2009-06-09 18:42                     ` H. Peter Anvin
2009-06-09 19:28                   ` Christian Couder
2009-06-09 19:32                     ` H. Peter Anvin
2009-06-10  8:14                       ` Christian Couder
2009-06-09 20:37                     ` Junio C Hamano
2009-06-10 19:37                       ` Christian Couder
2009-06-10 21:17                         ` Junio C Hamano
2009-06-10 22:43                           ` H. Peter Anvin
2009-06-11  4:02                           ` Christian Couder
2009-06-11  4:43                             ` H. Peter Anvin
2009-06-11  5:05                               ` H. Peter Anvin
2009-06-12 11:56                                 ` Christian Couder
2009-06-13 19:03                                   ` H. Peter Anvin
2009-06-13 19:35                                     ` Jakub Narebski
2009-06-13 19:57                                       ` H. Peter Anvin
2009-06-15  7:59                                     ` Christian Couder
2009-06-15 13:16                                       ` H. Peter Anvin
2009-06-13  7:50     ` Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).