git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/3] automatically skip away from broken commits
@ 2009-06-05  4:10 Christian Couder
  2009-06-05  4:10 ` [PATCH v2 1/3] bisect: add parameters to "filter_skipped" Christian Couder
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Christian Couder @ 2009-06-05  4:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

This patch series makes "git bisect skip" skip away from an area where
the commits cannot be tested.

This is done automatically. The user has nothing todo. We alternatively
try to test commits 20%, 40% and 60% away from the optimal commit that
has been skipped.

  bisect: add parameters to "filter_skipped"
  bisect: when skipping, choose a commit away from a skipped commit
  t6030: test skipping away from an already skipped commit

 bisect.c                    |   77 ++++++++++++++++++++++++++++++++++++++++--
 bisect.h                    |    4 ++-
 builtin-rev-list.c          |    4 ++-
 t/t6030-bisect-porcelain.sh |   12 +++++++
 4 files changed, 91 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] bisect: add parameters to "filter_skipped"
  2009-06-05  4:10 [PATCH v2 0/3] automatically skip away from broken commits Christian Couder
@ 2009-06-05  4:10 ` Christian Couder
  2009-06-05  6:48   ` Junio C Hamano
  2009-06-05  4:10 ` [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
  2009-06-05  4:10 ` [PATCH v2 3/3] t6030: test skipping away from an already " Christian Couder
  2 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-06-05  4:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

because we will need to get more information from this function in
some later patches.

The new "int *count" parameter gives the number of commits left after
the skipped commit have been filtered out.

The new "int *skipped_first" parameter tells us if the first commit
in the list has been skipped. Note that using this parameter also
changes the behavior of the function if the first commit is indeed
skipped. Because we assume that in this case we will want all the
filtered commits, not just the first one, even if "show_all" is not
set.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 bisect.c           |   26 ++++++++++++++++++++++----
 bisect.h           |    4 +++-
 builtin-rev-list.c |    4 +++-
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/bisect.c b/bisect.c
index 2c14f4d..02497f7 100644
--- a/bisect.c
+++ b/bisect.c
@@ -523,12 +523,19 @@ static char *join_sha1_array_hex(struct sha1_array *array, char delim)
 
 struct commit_list *filter_skipped(struct commit_list *list,
 				   struct commit_list **tried,
-				   int show_all)
+				   int show_all,
+				   int *count,
+				   int *skipped_first)
 {
 	struct commit_list *filtered = NULL, **f = &filtered;
 
 	*tried = NULL;
 
+	if (skipped_first)
+		*skipped_first = 0;
+	if (count)
+		*count = 0;
+
 	if (!skipped_revs.sha1_nr)
 		return list;
 
@@ -537,19 +544,30 @@ struct commit_list *filter_skipped(struct commit_list *list,
 		list->next = NULL;
 		if (0 <= lookup_sha1_array(&skipped_revs,
 					   list->item->object.sha1)) {
+			if (skipped_first && !*skipped_first)
+				*skipped_first = 1;
 			/* Move current to tried list */
 			*tried = list;
 			tried = &list->next;
 		} else {
-			if (!show_all)
-				return list;
+			if (!show_all) {
+				if (!skipped_first || !*skipped_first)
+					return list;
+			} else if (skipped_first && !*skipped_first) {
+				*skipped_first = -1;
+			}
 			/* Move current to filtered list */
 			*f = list;
 			f = &list->next;
+			if (count)
+				(*count)++;
 		}
 		list = next;
 	}
 
+	if (skipped_first && *skipped_first == -1)
+		*skipped_first = 0;
+
 	return filtered;
 }
 
@@ -865,7 +883,7 @@ int bisect_next_all(const char *prefix)
 
 	revs.commits = find_bisection(revs.commits, &reaches, &all,
 				       !!skipped_revs.sha1_nr);
-	revs.commits = filter_skipped(revs.commits, &tried, 0);
+	revs.commits = filter_skipped(revs.commits, &tried, 0, NULL, NULL);
 
 	if (!revs.commits) {
 		/*
diff --git a/bisect.h b/bisect.h
index fb744fd..82f8fc1 100644
--- a/bisect.h
+++ b/bisect.h
@@ -7,7 +7,9 @@ extern struct commit_list *find_bisection(struct commit_list *list,
 
 extern struct commit_list *filter_skipped(struct commit_list *list,
 					  struct commit_list **tried,
-					  int show_all);
+					  int show_all,
+					  int *count,
+					  int *skipped_first);
 
 extern void print_commit_list(struct commit_list *list,
 			      const char *format_cur,
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 73bff84..69753dc 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -262,7 +262,9 @@ int show_bisect_vars(struct rev_list_info *info, int reaches, int all)
 	if (!revs->commits && !(flags & BISECT_SHOW_TRIED))
 		return 1;
 
-	revs->commits = filter_skipped(revs->commits, &tried, flags & BISECT_SHOW_ALL);
+	revs->commits = filter_skipped(revs->commits, &tried,
+				       flags & BISECT_SHOW_ALL,
+				       NULL, NULL);
 
 	/*
 	 * revs->commits can reach "reaches" commits among
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit
  2009-06-05  4:10 [PATCH v2 0/3] automatically skip away from broken commits Christian Couder
  2009-06-05  4:10 ` [PATCH v2 1/3] bisect: add parameters to "filter_skipped" Christian Couder
@ 2009-06-05  4:10 ` Christian Couder
  2009-06-05  6:48   ` Junio C Hamano
  2009-06-05  4:10 ` [PATCH v2 3/3] t6030: test skipping away from an already " Christian Couder
  2 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-06-05  4:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

To do that a new function "apply_skip_ratio" is added and another
function "managed_skipped" is created to wrap both "filter_skipped"
and the previous one.

In "managed_skipped" we detect when we should choose a commit away
from a skipped one and then we automatically choose a skip ratio
to pass to "apply_skip_ratio".

The ratio is choosen so that it alternates between 1/5, 2/5 and
3/5.

In "apply_skip_ratio", we ignore a given ratio of all the commits
that could be tested.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 bisect.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/bisect.c b/bisect.c
index 02497f7..3ad9887 100644
--- a/bisect.c
+++ b/bisect.c
@@ -571,6 +571,57 @@ struct commit_list *filter_skipped(struct commit_list *list,
 	return filtered;
 }
 
+static struct commit_list *apply_skip_ratio(struct commit_list *list,
+					    int count,
+					    int skip_num, int skip_denom)
+{
+	int index, i;
+	struct commit_list *cur, *previous;
+
+	cur = list;
+	previous = NULL;
+	index = count * skip_num / skip_denom;
+
+	for (i = 0; cur; cur = cur->next, i++) {
+		if (i == index) {
+			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
+				return cur;
+			if (previous)
+				return previous;
+			return list;
+		}
+		previous = cur;
+	}
+
+	return list;
+}
+
+static struct commit_list *managed_skipped(struct commit_list *list,
+					   struct commit_list **tried)
+{
+	int count, skipped_first;
+	int skip_num, skip_denom;
+
+	*tried = NULL;
+
+	if (!skipped_revs.sha1_nr)
+		return list;
+
+	if (!skip_num)
+		return filter_skipped(list, tried, 0, NULL, NULL);
+
+	list = filter_skipped(list, tried, 0, &count, &skipped_first);
+
+	if (!skipped_first)
+		return list;
+
+	/* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */
+	skip_num = count % 3 + 1;
+	skip_denom = 5;
+
+	return apply_skip_ratio(list, count, skip_num, skip_denom);
+}
+
 static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
@@ -883,7 +934,7 @@ int bisect_next_all(const char *prefix)
 
 	revs.commits = find_bisection(revs.commits, &reaches, &all,
 				       !!skipped_revs.sha1_nr);
-	revs.commits = filter_skipped(revs.commits, &tried, 0, NULL, NULL);
+	revs.commits = managed_skipped(revs.commits, &tried);
 
 	if (!revs.commits) {
 		/*
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] t6030: test skipping away from an already skipped commit
  2009-06-05  4:10 [PATCH v2 0/3] automatically skip away from broken commits Christian Couder
  2009-06-05  4:10 ` [PATCH v2 1/3] bisect: add parameters to "filter_skipped" Christian Couder
  2009-06-05  4:10 ` [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
@ 2009-06-05  4:10 ` Christian Couder
  2 siblings, 0 replies; 9+ messages in thread
From: Christian Couder @ 2009-06-05  4:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar


Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t6030-bisect-porcelain.sh |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 5254b23..4556cdd 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -555,6 +555,18 @@ test_expect_success 'restricting bisection on one dir and a file' '
 	grep "$PARA_HASH4 is first bad commit" my_bisect_log.txt
 '
 
+test_expect_success 'skipping away from skipped commit' '
+	git bisect start $PARA_HASH7 $HASH1 &&
+	para4=$(git rev-parse --verify HEAD) &&
+	test "$para4" = "$PARA_HASH4" &&
+        git bisect skip &&
+	hash7=$(git rev-parse --verify HEAD) &&
+	test "$hash7" = "$HASH7" &&
+        git bisect skip &&
+	hash3=$(git rev-parse --verify HEAD) &&
+	test "$hash3" = "$HASH3"
+'
+
 #
 #
 test_done
-- 
1.6.3.GIT

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] bisect: add parameters to "filter_skipped"
  2009-06-05  4:10 ` [PATCH v2 1/3] bisect: add parameters to "filter_skipped" Christian Couder
@ 2009-06-05  6:48   ` Junio C Hamano
  2009-06-06  4:38     ` Christian Couder
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-06-05  6:48 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Christian Couder <chriscool@tuxfamily.org> writes:

> because we will need to get more information from this function in
> some later patches.
>
> The new "int *count" parameter gives the number of commits left after
> the skipped commit have been filtered out.
>
> The new "int *skipped_first" parameter tells us if the first commit
> in the list has been skipped. Note that using this parameter also
> changes the behavior of the function if the first commit is indeed
> skipped. Because we assume that in this case we will want all the
> filtered commits, not just the first one, even if "show_all" is not
> set.

The way you use (*skipped_first == -1) as a marker to mean "we've already
checked more than one commit_list so even when we see a one to be skipped,
it won't be the first one" is unreadable, especially without explanation.
Worse yet, the above paragraph explains what the parameter does, but why
is it so special to skip the one that happens to be the first on the input
list, especially when one does not know how the list is sorted to begin
with.

I understand that the list is sorted by the "goodness" value, i.e. the one
that cuts the graph into closer-to-equal halves comes earlier in the list,
but still it is unclear why having to skip the best one is so special
compared to having to skip say the second best one, especially when you
imagine a case where the first two on the list are of equal "goodness"
value.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit
  2009-06-05  4:10 ` [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
@ 2009-06-05  6:48   ` Junio C Hamano
  2009-06-06  4:38     ` Christian Couder
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-06-05  6:48 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Christian Couder <chriscool@tuxfamily.org> writes:

> +static struct commit_list *apply_skip_ratio(struct commit_list *list,
> +					    int count,
> +					    int skip_num, int skip_denom)
> +{
> +	int index, i;
> +	struct commit_list *cur, *previous;
> +
> +	cur = list;
> +	previous = NULL;
> +	index = count * skip_num / skip_denom;
> +
> +	for (i = 0; cur; cur = cur->next, i++) {
> +		if (i == index) {
> +			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
> +				return cur;
> +			if (previous)
> +				return previous;

When you return "previous", don't you need to make sure it is not the
currently bad one?  That is,...

> +			return list;
> +		}
> +		previous = cur;

... shouldn't this be

                if (hashcmp(cur->item->object.sha1, curret_bad_sha1))
                        previous = cur;

> +	}

I do not understand the math/logic you are using here.  The incoming list
is sorted by the "goodness" value, and untestable ones have already been
removed.  By picking ones that are far from the tip, you are deliberately
accepting far suboptimal bisection, but I do not see how you are
guaranteeing that making such a trade-off is driving you away from the
untestable ones.  It almost feels to me that it is no better than randomly
picking one from the list, so why can this be an improvement over "pick
the one that is testable that appears first in the list that is sorted by
the goodness value"?

Imagine you have a graph like this (G is good, B is bad, and both branches
have similar number of commits):

	G-------------y--z--a--b--c--B
                              /
        G--------------f--e--d

In the above graph, a and d would give the best bisection, because they
can reach about half the number of nodes in the entire graph, and the goal
of the basic "find_bisection()" is to find a commit that cuts the graph
into closest-to-the-equal halves.  For this reason, 'a' and 'd' would be
near the beginning of the output from find_bisection(find_all) you run
when there are untestable commits in your managed_skipped().

Suppose 'd' was already marked untestable, but 'a' is not.  And 'd' has
slightly better "goodness" value than 'a'.

	Side note.  I do not think the situation should change drastically
	if 'a' has a better "goodness" value than 'd' has, but your
	"skipped_first" logic that I already said I do not understand in
	my earlier comment treats them quite differently, so this example
	explicitly talks about the case where 'd' is better than 'a'.

You do not check 'a' but check somewhere much older, either on the top
branch or on the bottom branch.  Why?

'b', because it is direct descendant of 'd', is likely to have inherited
the unrelated breakage from 'd' that made it untestable.  'c', being its
descendant, can also be contaminated with the unrelated breakage, but
there is a chance that such an unrelated breakage was fixed there.
Similarly, 'e' is likely to share the same unrelated breakage as 'd' has,
because it is a direct ancestor.  'f' shares the possibility but has a
better chance of being testable than 'e' is.

But 'a' does not have any relationship with 'd' and it is unlikely such an
unrelated breakage exists there.

I _think_ the solution to the "bisection with untestable" problem should
be handled as an optimization problem of two goodness values that cannot
be directly compared and traded-off:

 - The point chosen should be close to the mid-point.  The closer "weight"
   value given in do_find_bisection() is to half the number of nodes, the
   better;

 - The point chosen should be far from any known untestable commits.  We
   do not currently have function to count distance from untestable
   commits, nor make a call to such a function after filtering untestable
   commits, but I think we should.

We already have an efficient and good implementation to compute the former
criteria (find_bisection).

The "badness" value that comes from the latter criteria essentially should
be how close, from ancestry point of view, each commit is related to the
ones that are known to be untestable.  In the picture, 'b' and 'e' are
close relatives (they have thick blood relationship with 'd') and worse
than 'c' and 'f' but 'a' shouldn't be punished for being married to 'd'
that has a bad blood at all.

Now coming back to the "skipped_first" topic.  Let's use the same graph,
but now let's assume neither 'a' nor 'd' is known to be untestable, and
also 'a' gives slightly better "goodness" value than 'd' does.  Further
suppose that 'z' was already known to be untestable this time.

Even though 'a' may have a better goodness value, hence it is not skipped,
shouldn't we be favoring 'd' over 'a' because 'a' is far likely to be
untestable (due to closer blood tie with 'z') than 'd' is, and 'a' and 'd'
would give roughly the same "goodness" value?

This is why I said I do not understand your skipped_first logic.

> +static struct commit_list *managed_skipped(struct commit_list *list,
> +					   struct commit_list **tried)
> +{
> +	int count, skipped_first;
> +	int skip_num, skip_denom;
> +
> +	*tried = NULL;
> +
> +	if (!skipped_revs.sha1_nr)
> +		return list;
> +
> +	if (!skip_num)
> +		return filter_skipped(list, tried, 0, NULL, NULL);

skip_num is uninitialized and then checked here.  Is it supposed to be
"static int" with implied 0 initialization?

> +	list = filter_skipped(list, tried, 0, &count, &skipped_first);
> +
> +	if (!skipped_first)
> +		return list;
> +
> +	/* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */
> +	skip_num = count % 3 + 1;
> +	skip_denom = 5;
> +
> +	return apply_skip_ratio(list, count, skip_num, skip_denom);
> +}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] bisect: add parameters to "filter_skipped"
  2009-06-05  6:48   ` Junio C Hamano
@ 2009-06-06  4:38     ` Christian Couder
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Couder @ 2009-06-06  4:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Le Friday 05 June 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > because we will need to get more information from this function in
> > some later patches.
> >
> > The new "int *count" parameter gives the number of commits left after
> > the skipped commit have been filtered out.
> >
> > The new "int *skipped_first" parameter tells us if the first commit
> > in the list has been skipped. Note that using this parameter also
> > changes the behavior of the function if the first commit is indeed
> > skipped. Because we assume that in this case we will want all the
> > filtered commits, not just the first one, even if "show_all" is not
> > set.
>
> The way you use (*skipped_first == -1) as a marker to mean "we've already
> checked more than one commit_list so even when we see a one to be
> skipped, it won't be the first one" is unreadable, especially without
> explanation. Worse yet, the above paragraph explains what the parameter
> does, but why is it so special to skip the one that happens to be the
> first on the input list, especially when one does not know how the list
> is sorted to begin with.

I added the following comment before the function:

/*
 * In this function, passing a not NULL skipped_first is very special.
 * It means that we want to know if the first commit in the list is
 * skipped because we will want to test a commit away from it if it is
 * indeed skipped.
 * So if the first commit is skipped, we cannot take the shortcut to
 * just "return list" when we find the first non skipped commit, we
 * have to return a fully filtered list.
 *
 * We use (*skipped_first == -1) to mean "it has been found that the
 * first commit is not skipped". In this case *skipped_first is set back
 * to 0 just before the function returns.
 */

I hope this is enough to clarify what this function does.

> I understand that the list is sorted by the "goodness" value, i.e. the
> one that cuts the graph into closer-to-equal halves comes earlier in the
> list, but still it is unclear why having to skip the best one is so
> special compared to having to skip say the second best one, especially
> when you imagine a case where the first two on the list are of equal
> "goodness" value.

The reason why only having to skip the best one is special is just because 
it is simpler to only check if the best one is skipped or not.

I agree that it could be an improvement to consider if other commits with 
the same goodness value are also skipped, but I think it would make the 
code more complex.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit
  2009-06-05  6:48   ` Junio C Hamano
@ 2009-06-06  4:38     ` Christian Couder
  2009-06-06  5:19       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-06-06  4:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Le Friday 05 June 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > +static struct commit_list *apply_skip_ratio(struct commit_list *list,
> > +					    int count,
> > +					    int skip_num, int skip_denom)
> > +{
> > +	int index, i;
> > +	struct commit_list *cur, *previous;
> > +
> > +	cur = list;
> > +	previous = NULL;
> > +	index = count * skip_num / skip_denom;
> > +
> > +	for (i = 0; cur; cur = cur->next, i++) {
> > +		if (i == index) {
> > +			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
> > +				return cur;
> > +			if (previous)
> > +				return previous;
>
> When you return "previous", don't you need to make sure it is not the
> currently bad one?  That is,...
>
> > +			return list;
> > +		}
> > +		previous = cur;
>
> ... shouldn't this be
>
>                 if (hashcmp(cur->item->object.sha1, curret_bad_sha1))
>                         previous = cur;
>
> > +	}

I think the current bad will always have the worst goodness value so it will 
always be the last item in the list, so it cannot be "previous".

You can try it out with "git rev-list --bisect-all HEAD ^HEAD~10" for 
example, the bad item (HEAD in this example) will always appear 
with "(dist=0)" at the end.

We rely on that in "bisect_next_all" to decide if we found the first bad 
commit. We do:

	bisect_rev = revs.commits->item->object.sha1;
	...

	if (!hashcmp(bisect_rev, current_bad_sha1)) {
		exit_if_skipped_commits(tried, current_bad_sha1);
		printf("%s is first bad commit\n", bisect_rev_hex);
		show_diff_tree(prefix, revs.commits->item);
		/* This means the bisection process succeeded. */
		exit(10);
	}

> I do not understand the math/logic you are using here.  The incoming list
> is sorted by the "goodness" value, and untestable ones have already been
> removed.

Yes.

> By picking ones that are far from the tip, you are deliberately 
> accepting far suboptimal bisection, but I do not see how you are
> guaranteeing that making such a trade-off is driving you away from the
> untestable ones. 

I think picking one commit far from the tip, doens't mean accepting far 
suboptimal bisection. HPA wrote in a recent thread (RFE: "git bisect 
reverse"):

"You don't drop to 1/2 bit of information until x = 0.11 or 0.89"

and I think the worst possible value with my code means x = 0.2 or x = 0.8 
(with the 3/5 ratio).

It's true that when using a ratio to skip away from an untestable one we 
don't check that we don't come near another untestable one. 

> It almost feels to me that it is no better than 
> randomly picking one from the list, so why can this be an improvement
> over "pick the one that is testable that appears first in the list that
> is sorted by the goodness value"?

We rely on the assumption that the probability that there are many 
untestable ones in many different places quite far away from an untestable 
one and from each other is very low.

For example using the following script:

-------------
#!/bin/sh

new_commit() {
    num="$1"
    echo "line for commit $num" >> hello
    git commit -m "commit $num" hello
}

touch hello
git add hello

for val in $(seq 100)
do
  new_commit $val
done
-------------

and then bisecting between the first and last commits gives something like:

$ git bisect start HEAD HEAD~99
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[e65f680ee31729d2db66dbed7eb24ea8c0e8b6c3] commit 50
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[6cc0df3b9762928c90a5ce7ad4d27bf3340983c3] commit 51
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[45387fab3997ca8c449ac419cb401207aa63da4d] commit 71
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[dd20066ed8a287e8105b833911de7509ca5ec339] commit 40
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[8f0c63e6eede9650bb577dba6f3092703409509e] commit 20
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[4809a426fd411f2db43291bc5d206b199ca9e648] commit 30
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[095cc6a1c6f071f3807ca123ab5bcfc133f36f96] commit 61
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[94a00c0de61ed8ddef261bebfbbb00bc6e0a3a1f] commit 82
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[5d1a2332b7d5d409faebb7056e11028f5b7642f4] commit 29
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[ad46dfdfb0a871b13144395efe26607cb63c7f5b] commit 39

So after testing commits 50 and 51 that are near the middle, you test many 
commits far away from both commits 50 and 51 and from each other, and yet 
not too close to 0 or 100.

> Imagine you have a graph like this (G is good, B is bad, and both
> branches have similar number of commits):
>
> 	G-------------y--z--a--b--c--B
>                               /
>         G--------------f--e--d
>
> In the above graph, a and d would give the best bisection, because they
> can reach about half the number of nodes in the entire graph, and the
> goal of the basic "find_bisection()" is to find a commit that cuts the
> graph into closest-to-the-equal halves.  For this reason, 'a' and 'd'
> would be near the beginning of the output from find_bisection(find_all)
> you run when there are untestable commits in your managed_skipped().
>
> Suppose 'd' was already marked untestable, but 'a' is not.  And 'd' has
> slightly better "goodness" value than 'a'.
>
> 	Side note.  I do not think the situation should change drastically
> 	if 'a' has a better "goodness" value than 'd' has, but your
> 	"skipped_first" logic that I already said I do not understand in
> 	my earlier comment treats them quite differently, so this example
> 	explicitly talks about the case where 'd' is better than 'a'.
>
> You do not check 'a' but check somewhere much older, either on the top
> branch or on the bottom branch.  Why?

The reason is that it would make the code more complex to check that we are 
in this case, and that this case may not happen very often (it relies on 
both being near a merge and having branches of the same size), and that we 
don't loose much (see above my reference to what HPA wrote) by testing a 
commit quite far away.

> 'b', because it is direct descendant of 'd', is likely to have inherited
> the unrelated breakage from 'd' that made it untestable.  'c', being its
> descendant, can also be contaminated with the unrelated breakage, but
> there is a chance that such an unrelated breakage was fixed there.
> Similarly, 'e' is likely to share the same unrelated breakage as 'd' has,
> because it is a direct ancestor.  'f' shares the possibility but has a
> better chance of being testable than 'e' is.
>
> But 'a' does not have any relationship with 'd' and it is unlikely such
> an unrelated breakage exists there.
>
> I _think_ the solution to the "bisection with untestable" problem should
> be handled as an optimization problem of two goodness values that cannot
> be directly compared and traded-off:
>
>  - The point chosen should be close to the mid-point.  The closer
> "weight" value given in do_find_bisection() is to half the number of
> nodes, the better;

I am not so sure this is so important.

>  - The point chosen should be far from any known untestable commits.  We
>    do not currently have function to count distance from untestable
>    commits, nor make a call to such a function after filtering untestable
>    commits, but I think we should.

I think this should be the case with my patch series.

> We already have an efficient and good implementation to compute the
> former criteria (find_bisection).
>
> The "badness" value that comes from the latter criteria essentially
> should be how close, from ancestry point of view, each commit is related
> to the ones that are known to be untestable.  In the picture, 'b' and 'e'
> are close relatives (they have thick blood relationship with 'd') and
> worse than 'c' and 'f' but 'a' shouldn't be punished for being married to
> 'd' that has a bad blood at all.
>
> Now coming back to the "skipped_first" topic.  Let's use the same graph,
> but now let's assume neither 'a' nor 'd' is known to be untestable, and
> also 'a' gives slightly better "goodness" value than 'd' does.  Further
> suppose that 'z' was already known to be untestable this time.
>
> Even though 'a' may have a better goodness value, hence it is not
> skipped, shouldn't we be favoring 'd' over 'a' because 'a' is far likely
> to be untestable (due to closer blood tie with 'z') than 'd' is, and 'a'
> and 'd' would give roughly the same "goodness" value?
>
> This is why I said I do not understand your skipped_first logic.

I agree that we could probably analyse the graph much better than what my 
patch series does, but I think that it would be quite complex.

> > +static struct commit_list *managed_skipped(struct commit_list *list,
> > +					   struct commit_list **tried)
> > +{
> > +	int count, skipped_first;
> > +	int skip_num, skip_denom;
> > +
> > +	*tried = NULL;
> > +
> > +	if (!skipped_revs.sha1_nr)
> > +		return list;
> > +
> > +	if (!skip_num)
> > +		return filter_skipped(list, tried, 0, NULL, NULL);
>
> skip_num is uninitialized and then checked here.  Is it supposed to be
> "static int" with implied 0 initialization?

Ooops, I forgot to remove these 2 lines. They were usefull in my previous 
series but they should be removed in this one.

> > +	list = filter_skipped(list, tried, 0, &count, &skipped_first);
> > +
> > +	if (!skipped_first)
> > +		return list;
> > +
> > +	/* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */
> > +	skip_num = count % 3 + 1;
> > +	skip_denom = 5;
> > +
> > +	return apply_skip_ratio(list, count, skip_num, skip_denom);
> > +}

I will repost the series with the above fix and some added comments.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit
  2009-06-06  4:38     ` Christian Couder
@ 2009-06-06  5:19       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2009-06-06  5:19 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Sam Vilain, H. Peter Anvin, Ingo Molnar

Christian Couder <chriscool@tuxfamily.org> writes:

> For example using the following script:
>
> -------------
> #!/bin/sh
>
> new_commit() {
>     num="$1"
>     echo "line for commit $num" >> hello
>     git commit -m "commit $num" hello
> }
>
> touch hello
> git add hello
>
> for val in $(seq 100)
> do
>   new_commit $val
> done

But isn't this a totally uninteresting case of 100 _single_ strand of
pearls?  For a linear history, you do not even need the bisect machinery;
you can even bisect by hand.

>> Imagine you have a graph like this (G is good, B is bad, and both
>> branches have similar number of commits):
>>
>> 	G-------------y--z--a--b--c--B
>>                               /
>>         G--------------f--e--d
>>
>> In the above graph, a and d would give the best bisection, because they
>> can reach about half the number of nodes in the entire graph, and the
>> goal of the basic "find_bisection()" is to find a commit that cuts the
>> graph into closest-to-the-equal halves.  For this reason, 'a' and 'd'
>> would be near the beginning of the output from find_bisection(find_all)
>> you run when there are untestable commits in your managed_skipped().
>>
>> Suppose 'd' was already marked untestable, but 'a' is not.  And 'd' has
>> slightly better "goodness" value than 'a'.
>>
>> 	Side note.  I do not think the situation should change drastically
>> 	if 'a' has a better "goodness" value than 'd' has, but your
>> 	"skipped_first" logic that I already said I do not understand in
>> 	my earlier comment treats them quite differently, so this example
>> 	explicitly talks about the case where 'd' is better than 'a'.
>>
>> You do not check 'a' but check somewhere much older, either on the top
>> branch or on the bottom branch.  Why?
>
> The reason is that it would make the code more complex to check that we are 
> in this case, and that this case may not happen very often (it relies on 
> both being near a merge and having branches of the same size), and that we 
> don't loose much (see above my reference to what HPA wrote) by testing a 
> commit quite far away.

I think what you are missing is that you are not even guaranteeing "quite
far away" in the _topological_ space, especially in the presense of merges.

If you focus on a totally linear history, yes, picking somewhere away, in
the "goodness" scale space, from the optimal point (and there is a single
optimal point) that is untestable happens to take you away from that
particular untestable one in the topology space as well.  But that only
holds true when you have no merges.

My example graph was not an extreme special case at all.  It is just an
illustration of a typical real-life history.  Sure, I told you to assume
that both sides have about equal number of commits, but if the top branch
were longer, then instead of 'a', perhaps 'y' or its parent may be the
next best commit.  It is still very close in the "goodness" space to the
untestable commit 'd', but it is very far away from it in the topological
space, but your algorithm discards it because it is close in the
"goodness" scale space.  And the distance in the topological space from
untestable ones is what we seek here.

>>  - The point chosen should be far from any known untestable commits.  We
>>    do not currently have function to count distance from untestable
>>    commits, nor make a call to such a function after filtering untestable
>>    commits, but I think we should.
>
> I think this should be the case with my patch series.

Why?  You are picking "away, in the goodness scale, from the _best_ one".
I've already explained why it does not follow that the commit chosen that
way is _topologically_ away from _untestable_ ones.

> I agree that we could probably analyse the graph much better than what my 
> patch series does, but I think that it would be quite complex.

If you do not want complexity, let's not even do this "away in the
goodness space from the best one".  Your three patches already add
complexity to the algorithm, and I do not think what they do is worth it.
It solves a wrong problem, that does not have anything to do with "stay
away from untestable ones in the topological space".

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-06  5:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-05  4:10 [PATCH v2 0/3] automatically skip away from broken commits Christian Couder
2009-06-05  4:10 ` [PATCH v2 1/3] bisect: add parameters to "filter_skipped" Christian Couder
2009-06-05  6:48   ` Junio C Hamano
2009-06-06  4:38     ` Christian Couder
2009-06-05  4:10 ` [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit Christian Couder
2009-06-05  6:48   ` Junio C Hamano
2009-06-06  4:38     ` Christian Couder
2009-06-06  5:19       ` Junio C Hamano
2009-06-05  4:10 ` [PATCH v2 3/3] t6030: test skipping away from an already " Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).