Stochastic bisection support

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Stochastic bisection support
@ 2021-11-18 16:49 Jan Kara
  2021-11-18 16:49 ` [PATCH 01/27] bisect: Fixup test rev-list-bisect/02 Jan Kara
                   ` (29 more replies)
  0 siblings, 30 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Hello,

In some cases regressions (or generally changes) we are trying to bisect have
probabilistic nature. This can for example happen for hard to trigger race
condition where it is difficult to distinguish working state from just not
hitting the race or it can happen for performance regressions where it is
sometimes difficult to distinguish random workload fluctuations from the
regression we are looking for. With standard bisection the only option we have
is to repeatedly test suggested bisection point until we are sure enough which
way to go. This leads to rather long bisection times and still a single wrong
decision whether a commit is good to bad renders the whole bisection useless.

Stochastic bisection tries to address these problems. When deciding whether a
commit is good or bad, you can also specify your confidence in the decision.
For performance tests you can usually directly infer this confidence from the
distance of your current result from good/bad values, for hard to reproduce
races you are usually 100% confident for bad commits, for good commits you need
to somehow estimate your confidence based on past experience with reproducing
the issue. The stochastic bisection algorithm then uses these test results
and confidences to suggest next commit to try, tracking for each commit the
probability the commit is the bad one given current test results. Once some
commit reaches high enough probability (set when starting bisection) of being
the bad one, we stop bisecting and annouce this commit.

Example:

Consider an example of a stochastic bisection of the following commits:

A--B--C--D-----F-----H--------K
 \     \  \-E-/     /        /
  \     \--------G-/        /
   \------------------I--J-/

And suppose commit I is the bad one. Let's start bisection with:

# We ask bisection for 90% confidence in the identified commit being bad
git bisect start --confidence 0.9 %K %A

# Bisection tells us to test %F. Let's assume test went well and we trust
# our test results on 70%. So:
git bisect good --confidence 0.7

# Bisection tells us to test %H. Again same result:
git bisect good --confidence 0.7

# Bisection tells us to test %J. The test should fail for %J (remember %I is
# the bad commit) but let's assume the test is falsely positive. So:
git bisect good --confidence 0.7

# We are asked to test %H second time. Assume correct result so:
git bisect good --confidence 0.7

# We are asked to test %J second time. Assume correct result so:
git bisect bad --confidence 0.7

# We are asked to test %J again. Assume correct result so:
git bisect bad --confidence 0.7

# And %J once more. Assume false positive so:
git bisect good --confidence 0.7

# And %J once more. Assume correct result so:
git bisect bad --confidence 0.7

# And %J again. Assume correct result so:
git bisect bad --confidence 0.7

# And now we are asked to test %I. Assume correct result so:
git bisect bad --confidence 0.7

# We are asked to test %I second time. Assume false positive so:
git bisect good --confidence 0.7

# And %I once again. Assume correct result so:
git bisect bad --confidence 0.7

# And %I once again. Assume correct result so:
git bisect bad --confidence 0.7

# And %I once again. Assume correct result so:
git bisect bad --confidence 0.7

And now git tells us %I is the bad commit with desired confidence. We can see
the bisection was able to identify the bad commit although there were three
false positive tests (out of total 14 tests).

------

This patch set implements stochastic bisection for git. The first part of the
series improves some tests so that they accept other valid decisions for
bisection points. This is needed because to make it easier to share some logic
between normal and stochastic bisection, I needed to slightly change some bits
for normal bisection and then since commit weights will be computed in a
somewhat different order, also chosen bisection points are sometimes different.

The second part of the series then implements stochastic bisection itself.
Note that I didn't integrate any tests for stochastic bisection into 'make
test' run yet (so far I did only manual tests) and I still need to update
manpages etc. I plan to do that but I've decided to post the series now to get
some early feedback.

								Honza

PS: Please leave me in CC for replies. I'm not subscribed to the git mailing
list.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/27] bisect: Fixup test rev-list-bisect/02
  2021-11-18 16:49 Stochastic bisection support Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 20:08   ` Chris Torek
  2021-11-18 16:49 ` [PATCH 02/27] bisect: Fixup bisect-porcelain/17 Jan Kara
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 2 from t6002-rev-list-bisect.sh expects 'c2' as the bisection point
but b2 is an equivalent choice. Improve the test to accept both.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6002-rev-list-bisect.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
index b95a0212adff..48db52447fd3 100755
--- a/t/t6002-rev-list-bisect.sh
+++ b/t/t6002-rev-list-bisect.sh
@@ -247,8 +247,9 @@ test_expect_success 'set up fake --bisect refs' '
 test_expect_success 'rev-list --bisect can default to good/bad refs' '
 	# the only thing between c3 and c1 is c2
 	git rev-parse c2 >expect &&
-	git rev-list --bisect >actual &&
-	test_cmp expect actual
+	git rev-parse b2 >>expect &&
+	actual=$(git rev-list --bisect) &&
+	grep &>/dev/null $actual expect
 '
 
 test_expect_success 'rev-parse --bisect can default to good/bad refs' '
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/27] bisect: Fixup bisect-porcelain/17
  2021-11-18 16:49 Stochastic bisection support Jan Kara
  2021-11-18 16:49 ` [PATCH 01/27] bisect: Fixup test rev-list-bisect/02 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 22:05   ` Taylor Blau
  2021-11-18 16:49 ` [PATCH 03/27] bisect: Fixup test bisect-porcelain/20 Jan Kara
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 17 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests first HASH3 where HASH2 and HASH3 are equivalent choices. Make
sure test correctly handles both choices, add test variant to properly
test commit skipping in the second case.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 1be85d064e76..f8cfdd3c36d2 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -197,11 +197,27 @@ test_expect_success 'bisect skip: successful result' '
 	test_when_finished git bisect reset &&
 	git bisect reset &&
 	git bisect start $HASH4 $HASH1 &&
-	git bisect skip &&
+	if [ $(git rev-parse HEAD) == $HASH3 ]; then
+		git bisect skip
+	fi &&
 	git bisect bad > my_bisect_log.txt &&
 	grep "$HASH2 is the first bad commit" my_bisect_log.txt
 '
 
+# $HASH1 is good, $HASH4 is bad, we skip $HASH2
+# but $HASH3 is good,
+# so we should find $HASH4 as the first bad commit
+test_expect_success 'bisect skip: successful result' '
+	test_when_finished git bisect reset &&
+	git bisect reset &&
+	git bisect start $HASH4 $HASH1 &&
+	if [ $(git rev-parse HEAD) == $HASH2 ]; then
+		git bisect skip
+	fi &&
+	git bisect good > my_bisect_log.txt &&
+	grep "$HASH4 is the first bad commit" my_bisect_log.txt
+'
+
 # $HASH1 is good, $HASH4 is bad, we skip $HASH3 and $HASH2
 # so we should not be able to tell the first bad commit
 # among $HASH2, $HASH3 and $HASH4
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/27] bisect: Fixup test bisect-porcelain/20
  2021-11-18 16:49 Stochastic bisection support Jan Kara
  2021-11-18 16:49 ` [PATCH 01/27] bisect: Fixup test rev-list-bisect/02 Jan Kara
  2021-11-18 16:49 ` [PATCH 02/27] bisect: Fixup bisect-porcelain/17 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 20:13   ` Chris Torek
  2021-11-18 16:49 ` [PATCH 04/27] bisect: Fixup bisect-porcelain/32 Jan Kara
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 20 from t6030-bisect-porcelain.sh fails if the bisection algorithm
picks HASH2 instead of HASH3 as the first step although these are
equivalent. Fix the test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index f8cfdd3c36d2..13f7deea4d81 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -240,8 +240,13 @@ test_expect_success 'bisect skip: cannot tell between 3 commits' '
 test_expect_success 'bisect skip: cannot tell between 2 commits' '
 	test_when_finished git bisect reset &&
 	git bisect start $HASH4 $HASH1 &&
-	git bisect skip &&
-	test_expect_code 2 git bisect good >my_bisect_log.txt &&
+	if [ $(git rev-parse HEAD) == $HASH2 ]; then
+		results=('good' 'skip')
+	else
+		results=('skip' 'good')
+	fi &&
+	git bisect ${results[0]} &&
+	test_expect_code 2 git bisect ${results[1]} >my_bisect_log.txt &&
 	grep "first bad commit could be any of" my_bisect_log.txt &&
 	! grep $HASH1 my_bisect_log.txt &&
 	! grep $HASH2 my_bisect_log.txt &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/27] bisect: Fixup bisect-porcelain/32
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (2 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 03/27] bisect: Fixup test bisect-porcelain/20 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 05/27] bisect: Fixup bisect-porcelain/34 Jan Kara
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 32 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests HASH6 after HASH4 when HASH5 is an equivalent choice. Fix the
test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 13f7deea4d81..d693c0002098 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -395,9 +395,13 @@ test_expect_success 'bisect does not create a "bisect" branch' '
 	test "$rev_hash4" = "$HASH4" &&
 	git branch -D bisect &&
 	git bisect good &&
+	rev_hash=$(git rev-parse --verify HEAD) &&
+	if [ $rev_hash == "$HASH5" ]; then
+		git bisect good &&
+		rev_hash=$(git rev-parse --verify HEAD)
+	fi &&
 	git branch bisect &&
-	rev_hash6=$(git rev-parse --verify HEAD) &&
-	test "$rev_hash6" = "$HASH6" &&
+	test "$rev_hash" = "$HASH6" &&
 	git bisect good > my_bisect_log.txt &&
 	grep "$HASH7 is the first bad commit" my_bisect_log.txt &&
 	git bisect reset &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/27] bisect: Fixup bisect-porcelain/34
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (3 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 04/27] bisect: Fixup bisect-porcelain/32 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 06/27] bisect: Fixup bisect-porcelain/40 Jan Kara
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 34 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests HASH6 after merge base when HASH5 is an equivalent choice. Fix
the test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index d693c0002098..ed81a6403b63 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -433,7 +433,7 @@ test_expect_success 'good merge base when good and bad are siblings' '
 	grep $HASH4 my_bisect_log.txt &&
 	git bisect good > my_bisect_log.txt &&
 	! grep "merge base must be tested" my_bisect_log.txt &&
-	grep $HASH6 my_bisect_log.txt &&
+	grep -E "$HASH5|$HASH6" my_bisect_log.txt &&
 	git bisect reset
 '
 test_expect_success 'skipped merge base when good and bad are siblings' '
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/27] bisect: Fixup bisect-porcelain/40
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (4 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 05/27] bisect: Fixup bisect-porcelain/34 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 07/27] bisect: Remove duplicated bisect-porcelain/48 Jan Kara
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 40 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests HASH6 after merge base when HASH5 is an equivalent choice. Fix
the test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index ed81a6403b63..0f2a91996393 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -524,8 +524,9 @@ test_expect_success 'optimized merge base checks' '
 	grep "$HASH4" my_bisect_log.txt &&
 	git bisect good > my_bisect_log2.txt &&
 	test -f ".git/BISECT_ANCESTORS_OK" &&
-	test "$HASH6" = $(git rev-parse --verify HEAD) &&
-	git bisect bad &&
+	rev_hash=$(git rev-parse --verify HEAD) &&
+	test "$HASH5" = "$rev_hash" -o "$HASH6" = "$rev_hash" &&
+	git bisect bad "$HASH6" &&
 	git bisect good "$A_HASH" > my_bisect_log4.txt &&
 	test_i18ngrep "merge base must be tested" my_bisect_log4.txt &&
 	test_path_is_missing ".git/BISECT_ANCESTORS_OK"
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/27] bisect: Remove duplicated bisect-porcelain/48
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (5 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 06/27] bisect: Fixup bisect-porcelain/40 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 08/27] bisect: Fixup bisect-porcelain/50 Jan Kara
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 48 from t6030-bisect-porcelain.sh claims it tests whether bisection
fails if tree is broken on start commit. However that actually does not
happen because the bisection really only fails because the first trial
point we choose happens to be broken. Furthermore there is another
equivalent trial point which is not broken so the test is not reliable.
Remove it as test 49 tests the same behavior is a more reliable setting.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 0f2a91996393..4ec7b5b5a72e 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -675,12 +675,6 @@ cat > expected.missing-tree.default <<EOF
 fatal: unable to read tree $deleted
 EOF
 
-test_expect_success 'bisect fails if tree is broken on start commit' '
-	git bisect reset &&
-	test_must_fail git bisect start BROKEN_HASH7 BROKEN_HASH4 2>error.txt &&
-	test_cmp expected.missing-tree.default error.txt
-'
-
 test_expect_success 'bisect fails if tree is broken on trial commit' '
 	git bisect reset &&
 	test_must_fail git bisect start BROKEN_HASH9 BROKEN_HASH4 2>error.txt &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/27] bisect: Fixup bisect-porcelain/50
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (6 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 07/27] bisect: Remove duplicated bisect-porcelain/48 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 09/27] bisect: Fixup bisect-porcelain/54 Jan Kara
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 50 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests BROKEN_HASH6 when BROKEN_HASH5 is an equivalent choice. Fix the
test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 4ec7b5b5a72e..79f253f01b00 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -689,10 +689,23 @@ check_same()
 	test_cmp_rev "$1" "$2"
 }
 
+check_oneof()
+{
+	base="$1"
+	shift
+	echo "Checking $base is among $@" &&
+	for rev in "$@"; do
+		if test_cmp_rev "$base" "$rev"; then
+			return 0
+		fi
+	done
+	return 1
+}
+
 test_expect_success 'bisect: --no-checkout - start commit bad' '
 	git bisect reset &&
 	git bisect start BROKEN_HASH7 BROKEN_HASH4 --no-checkout &&
-	check_same BROKEN_HASH6 BISECT_HEAD &&
+	check_oneof BISECT_HEAD BROKEN_HASH5 BROKEN_HASH6 &&
 	git bisect reset
 '
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/27] bisect: Fixup bisect-porcelain/54
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (7 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 08/27] bisect: Fixup bisect-porcelain/50 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 10/27] bisect: Fixup bisect-porcelain/58 Jan Kara
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 54 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests BROKEN_HASH8 as the second bisection point when BROKEN_HASH7 is
an equivalent choice. Fix the test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 79f253f01b00..46d67929e1e5 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -743,7 +743,11 @@ test_expect_success 'bisect: --no-checkout - target after breakage' '
 	git bisect start broken BROKEN_HASH4 --no-checkout &&
 	check_same BROKEN_HASH6 BISECT_HEAD &&
 	git bisect good BISECT_HEAD &&
-	check_same BROKEN_HASH8 BISECT_HEAD &&
+	check_oneof BISECT_HEAD BROKEN_HASH7 BROKEN_HASH8 &&
+	if check_same BROKEN_HASH7 BISECT_HEAD; then
+		git bisect good BISECT_HEAD &&
+		check_same BROKEN_HASH8 BISECT_HEAD
+	fi &&
 	test_must_fail git bisect good BISECT_HEAD &&
 	check_same BROKEN_HASH9 bisect/bad &&
 	git bisect reset
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/27] bisect: Fixup bisect-porcelain/58
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (8 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 09/27] bisect: Fixup bisect-porcelain/54 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 11/27] bisect: Fix bisection debugging Jan Kara
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Test 58 from t6030-bisect-porcelain.sh assumes that bisection algorithm
suggests HASH6 as the last bisection step when HASH5 is an equivalent
choice. Fix the test to work in both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 t/t6030-bisect-porcelain.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 46d67929e1e5..3363dc765b9d 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -811,7 +811,8 @@ test_expect_success '"git bisect bad HEAD" behaves as "git bisect bad"' '
 	git bisect start HEAD $HASH1 &&
 	git bisect good HEAD &&
 	git bisect bad HEAD &&
-	test "$HASH6" = $(git rev-parse --verify HEAD) &&
+	rev=$(git rev-parse --verify HEAD) &&
+	test "$HASH5" = "$rev" -o "$HASH6" = "$rev" &&
 	git bisect reset
 '
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/27] bisect: Fix bisection debugging
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (9 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 10/27] bisect: Fixup bisect-porcelain/58 Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 12/27] bisect: Accept and store confidence with each decision Jan Kara
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

show_list() dumps commit weights associated with each commit. Thus it
needs commit_weight slab already initialized. Move initialization of
commit_weight slab before the first show_list() call.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.c b/bisect.c
index 888949fba6b5..ab264b8ca879 100644
--- a/bisect.c
+++ b/bisect.c
@@ -395,8 +395,8 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 	struct commit_list *list, *p, *best, *next, *last;
 	int *weights;
 
-	show_list("bisection 2 entry", 0, 0, *commit_list);
 	init_commit_weight(&commit_weight);
+	show_list("bisection 2 entry", 0, 0, *commit_list);
 
 	/*
 	 * Count the number of total and tree-changing items on the
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/27] bisect: Accept and store confidence with each decision
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (10 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 11/27] bisect: Fix bisection debugging Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 13/27] bisect: Allow specifying desired result confidence Jan Kara
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

With each decision whether a commit is good or bad, accept an optional
confidence argument and store it in bisect log and special file with
probabilities a test is recording a 'bad' result. We will later use
these probabilities to alter decisions about the next bisection point.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c                 |  51 ++++++++++++++++
 builtin/bisect--helper.c | 129 ++++++++++++++++++++++++++++++++-------
 fixedpoint.h             |  29 +++++++++
 3 files changed, 186 insertions(+), 23 deletions(-)
 create mode 100644 fixedpoint.h

diff --git a/bisect.c b/bisect.c
index ab264b8ca879..0773a872c82b 100644
--- a/bisect.c
+++ b/bisect.c
@@ -16,8 +16,10 @@
 #include "commit-reach.h"
 #include "object-store.h"
 #include "dir.h"
+#include "fixedpoint.h"
 
 static struct oid_array good_revs;
+static struct oid_array ptest_revs;
 static struct oid_array skipped_revs;
 
 static struct object_id *current_bad_oid;
@@ -444,12 +446,17 @@ static int register_ref(const char *refname, const struct object_id *oid,
 			int flags, void *cb_data)
 {
 	struct strbuf good_prefix = STRBUF_INIT;
+	struct strbuf ptest_prefix = STRBUF_INIT;
+
 	strbuf_addstr(&good_prefix, term_good);
 	strbuf_addstr(&good_prefix, "-");
+	strbuf_addstr(&ptest_prefix, "ptest-");
 
 	if (!strcmp(refname, term_bad)) {
 		current_bad_oid = xmalloc(sizeof(*current_bad_oid));
 		oidcpy(current_bad_oid, oid);
+	} else if (starts_with(refname, ptest_prefix.buf)) {
+		oid_array_append(&ptest_revs, oid);
 	} else if (starts_with(refname, good_prefix.buf)) {
 		oid_array_append(&good_revs, oid);
 	} else if (starts_with(refname, "skip-")) {
@@ -474,8 +481,46 @@ static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_first_parent, "BISECT_FIRST_PARENT")
+static GIT_PATH_FUNC(git_path_bisect_confidences, "BISECT_CONFIDENCES")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 
+static void read_bisect_confidences(void)
+{
+	struct strbuf str = STRBUF_INIT;
+	const char *filename = git_path_bisect_confidences();
+	FILE *fp = fopen(filename, "r");
+
+	/* Just a regular bisection? */
+	if (!fp)
+		return;
+
+	while (strbuf_getline_lf(&str, fp) != EOF) {
+		fpnum_t prob;
+		char *spc;
+		char state;
+		struct object_id oid;
+
+		strbuf_trim(&str);
+		spc = strchr(str.buf, ' ');
+		if (!spc)
+			die(_("Badly formatted content in file '%s': %s"),
+			    filename, str.buf);
+		*spc = 0;
+		if (sscanf(spc + 1, "%c %" FPNUM_FMT, &state, &prob) != 2)
+			die(_("Cannot parse confidence in file '%s': %s"),
+			    filename, spc+1);
+		if (state != 'g' && state != 'b')
+			die(_("Unknown test state in file '%s': '%c'"),
+			    filename, state);
+		if (get_oid(str.buf, &oid))
+			die(_("Cannot get oid of rev '%s' from file '%s'"),
+			    str.buf, filename);
+		/* We'll use parsed data later */
+	}
+	strbuf_release(&str);
+	fclose(fp);
+}
+
 static void read_bisect_paths(struct strvec *array)
 {
 	struct strbuf str = STRBUF_INIT;
@@ -661,6 +706,10 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
 	strvec_push(&rev_argv, "bisect_rev_setup");
+	/*
+	 * We use only revision certainly known to be good or bad for limiting
+	 * a search.
+	 */
 	strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
 	for (i = 0; i < good_revs.nr; i++)
 		strvec_pushf(&rev_argv, good_format,
@@ -1023,6 +1072,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	read_bisect_terms(&term_bad, &term_good);
 	if (read_bisect_refs())
 		die(_("reading bisect refs failed"));
+	read_bisect_confidences();
 
 	if (file_exists(git_path_bisect_first_parent()))
 		bisect_flags |= FIND_BISECTION_FIRST_PARENT_ONLY;
@@ -1172,6 +1222,7 @@ int bisect_clean_state(void)
 	unlink_or_warn(git_path_bisect_run());
 	unlink_or_warn(git_path_bisect_terms());
 	unlink_or_warn(git_path_bisect_first_parent());
+	unlink_or_warn(git_path_bisect_confidences());
 	/* Cleanup head-name if it got left by an old version of git-bisect */
 	unlink_or_warn(git_path_head_name());
 	/*
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 28a2e6a5750b..f88feb8da949 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -9,6 +9,7 @@
 #include "prompt.h"
 #include "quote.h"
 #include "revision.h"
+#include "fixedpoint.h"
 
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
@@ -19,6 +20,7 @@ static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
 static GIT_PATH_FUNC(git_path_bisect_first_parent, "BISECT_FIRST_PARENT")
 static GIT_PATH_FUNC(git_path_bisect_run, "BISECT_RUN")
+static GIT_PATH_FUNC(git_path_bisect_confidences, "BISECT_CONFIDENCES")
 
 static const char * const git_bisect_helper_usage[] = {
 	N_("git bisect--helper --bisect-reset [<commit>]"),
@@ -26,8 +28,8 @@ static const char * const git_bisect_helper_usage[] = {
 	N_("git bisect--helper --bisect-start [--term-{new,bad}=<term> --term-{old,good}=<term>]"
 					    " [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<paths>...]"),
 	N_("git bisect--helper --bisect-next"),
-	N_("git bisect--helper --bisect-state (bad|new) [<rev>]"),
-	N_("git bisect--helper --bisect-state (good|old) [<rev>...]"),
+	N_("git bisect--helper --bisect-state (bad|new) [--confidence <conf>] [<rev>]"),
+	N_("git bisect--helper --bisect-state (good|old) [--confidence <conf>] [<rev>...]"),
 	N_("git bisect--helper --bisect-replay <filename>"),
 	N_("git bisect--helper --bisect-skip [(<rev>|<range>)...]"),
 	N_("git bisect--helper --bisect-visualize"),
@@ -254,18 +256,24 @@ static void log_commit(FILE *fp, char *fmt, const char *state,
 	free(label);
 }
 
-static int bisect_write(const char *state, const char *rev,
+static int bisect_write(const char *state, const char *rev, fpnum_t confidence,
 			const struct bisect_terms *terms, int nolog)
 {
+	const char *logstate = state;
 	struct strbuf tag = STRBUF_INIT;
 	struct object_id oid;
 	struct commit *commit;
 	FILE *fp = NULL;
 	int res = 0;
 
+	/* Uncertain result? */
+	if (one_of(state, terms->term_bad, terms->term_good, NULL) &&
+	    confidence != FP_ONE)
+		state = "ptest";
+
 	if (!strcmp(state, terms->term_bad)) {
 		strbuf_addf(&tag, "refs/bisect/%s", state);
-	} else if (one_of(state, terms->term_good, "skip", NULL)) {
+	} else if (one_of(state, terms->term_good, "skip", "ptest", NULL)) {
 		strbuf_addf(&tag, "refs/bisect/%s-%s", state, rev);
 	} else {
 		res = error(_("Bad bisect_write argument: %s"), state);
@@ -283,6 +291,24 @@ static int bisect_write(const char *state, const char *rev,
 		goto finish;
 	}
 
+	/* Store confidence if it is non-trivial */
+	if (!strcmp(state, "ptest")) {
+		char cstate;
+
+		fp = fopen(git_path_bisect_confidences(), "a");
+		if (!fp) {
+			res = error_errno(_("couldn't open the file '%s'"),
+				git_path_bisect_confidences());
+			goto finish;
+		}
+		if (!strcmp(logstate, terms->term_bad))
+			cstate = 'b';
+		else
+			cstate = 'g';
+		fprintf(fp, "%s %c %" FPNUM_FMT "\n", rev, cstate, confidence);
+		fclose(fp);
+	}
+
 	fp = fopen(git_path_bisect_log(), "a");
 	if (!fp) {
 		res = error_errno(_("couldn't open the file '%s'"), git_path_bisect_log());
@@ -290,10 +316,16 @@ static int bisect_write(const char *state, const char *rev,
 	}
 
 	commit = lookup_commit_reference(the_repository, &oid);
-	log_commit(fp, "%s", state, commit);
+	log_commit(fp, "%s", logstate, commit);
 
-	if (!nolog)
-		fprintf(fp, "git bisect %s %s\n", state, rev);
+	if (!nolog) {
+		if (!strcmp(state, "ptest")) {
+			fprintf(fp, "git bisect %s --confidence %lf %s\n",
+				logstate, fp_to_double(confidence), rev);
+		} else {
+			fprintf(fp, "git bisect %s %s\n", logstate, rev);
+		}
+	}
 
 finish:
 	if (fp)
@@ -612,6 +644,16 @@ static enum bisect_error bisect_auto_next(struct bisect_terms *terms, const char
 	return bisect_next(terms, prefix);
 }
 
+static int parse_confidence(const char *str, fpnum_t *confidence)
+{
+	double confd;
+
+	if (sscanf(str, "%lf", &confd) != 1 || confd < 0 || confd > 1)
+		return error(_("invalid confidence '%s'"), str);
+	*confidence = double_to_fp(confd);
+	return 0;
+}
+
 static enum bisect_error bisect_start(struct bisect_terms *terms, const char **argv, int argc)
 {
 	int no_checkout = 0;
@@ -784,8 +826,8 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
 	write_file(git_path_bisect_names(), "%s\n", bisect_names.buf);
 
 	for (i = 0; i < states.nr; i++)
-		if (bisect_write(states.items[i].string,
-				 revs.items[i].string, terms, 1)) {
+		if (bisect_write(states.items[i].string, revs.items[i].string,
+		    FP_ONE, terms, 1)) {
 			res = BISECT_FAILED;
 			goto finish;
 		}
@@ -854,6 +896,7 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a
 	struct object_id oid, expected;
 	struct strbuf buf = STRBUF_INIT;
 	struct oid_array revs = OID_ARRAY_INIT;
+	fpnum_t confidence = FP_ONE;
 
 	if (!argc)
 		return error(_("Please call `--bisect-state` with at least one argument"));
@@ -868,6 +911,15 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a
 
 	argv++;
 	argc--;
+
+	if (argc > 0 && !strcmp(argv[0], "--confidence")) {
+		if (argc < 2)
+			return error(_("missing confidence argument"));
+		if (parse_confidence(argv[1], &confidence))
+			return -1;
+		argv += 2;
+		argc -= 2;
+	}
 	if (argc > 1 && !strcmp(state, terms->term_bad))
 		return error(_("'git bisect %s' can take only one argument."), terms->term_bad);
 
@@ -912,7 +964,8 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a
 	strbuf_release(&buf);
 
 	for (i = 0; i < revs.nr; i++) {
-		if (bisect_write(state, oid_to_hex(&revs.oid[i]), terms, 0)) {
+		if (bisect_write(state, oid_to_hex(&revs.oid[i]), confidence,
+				 terms, 0)) {
 			oid_array_clear(&revs);
 			return BISECT_FAILED;
 		}
@@ -946,39 +999,69 @@ static enum bisect_error bisect_log(void)
 
 static int process_replay_line(struct bisect_terms *terms, struct strbuf *line)
 {
-	const char *p = line->buf + strspn(line->buf, " \t");
-	char *word_end, *rev;
+	char *p = line->buf + strspn(line->buf, " \t");
+	char *word_end, *cmd;
 
-	if ((!skip_prefix(p, "git bisect", &p) &&
-	!skip_prefix(p, "git-bisect", &p)) || !isspace(*p))
+	if ((!skip_prefix(p, "git bisect", (const char **)&p) &&
+	!skip_prefix(p, "git-bisect", (const char **)&p)) || !isspace(*p))
 		return 0;
 	p += strspn(p, " \t");
 
+	cmd = p;
 	word_end = (char *)p + strcspn(p, " \t");
-	rev = word_end + strspn(word_end, " \t");
+	p = word_end + strspn(word_end, " \t");
 	*word_end = '\0'; /* NUL-terminate the word */
 
 	get_terms(terms);
-	if (check_and_set_terms(terms, p))
+	if (check_and_set_terms(terms, cmd))
 		return -1;
 
-	if (!strcmp(p, "start")) {
+	if (!strcmp(cmd, "start")) {
 		struct strvec argv = STRVEC_INIT;
 		int res;
-		sq_dequote_to_strvec(rev, &argv);
+		sq_dequote_to_strvec(p, &argv);
 		res = bisect_start(terms, argv.v, argv.nr);
 		strvec_clear(&argv);
 		return res;
 	}
 
-	if (one_of(p, terms->term_good,
-	   terms->term_bad, "skip", NULL))
-		return bisect_write(p, rev, terms, 0);
+	if (one_of(cmd, terms->term_good,
+	   terms->term_bad, "skip", NULL)) {
+		fpnum_t confidence = FP_ONE;
+		char *arg;
+
+		arg = p;
+		word_end = (char *)p + strcspn(p, " \t");
+		p = word_end + strspn(word_end, " \t");
+		*word_end = '\0';
+		if (!strcmp(arg, "--confidence")) {
+			if (!strcmp(cmd, "skip")) {
+				error(_("skipping does not take confidence"));
+				return -1;
+			}
+			if (!*p) {
+				error(_("missing bisect confidence argument"));
+				return -1;
+			}
+			arg = p;
+			word_end = (char *)p + strcspn(p, " \t");
+			p = word_end + strspn(word_end, " \t");
+			*word_end = '\0';
+			if (parse_confidence(arg, &confidence))
+				return -1;
+			arg = p;
+		}
+		if (!*arg) {
+			error(_("missing bisect revision"));
+			return -1;
+		}
+		return bisect_write(cmd, arg, confidence, terms, 0);
+	}
 
-	if (!strcmp(p, "terms")) {
+	if (!strcmp(cmd, "terms")) {
 		struct strvec argv = STRVEC_INIT;
 		int res;
-		sq_dequote_to_strvec(rev, &argv);
+		sq_dequote_to_strvec(p, &argv);
 		res = bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL);
 		strvec_clear(&argv);
 		return res;
diff --git a/fixedpoint.h b/fixedpoint.h
new file mode 100644
index 000000000000..addef223be2b
--- /dev/null
+++ b/fixedpoint.h
@@ -0,0 +1,29 @@
+#ifndef FIXEDPOINT_H
+#define FIXEDPOINT_H
+
+#include <inttypes.h>
+
+#define FIXEDPOINT_SHIFT 32
+
+typedef uint64_t fpnum_t;
+
+#define FPNUM_FMT PRIu64
+
+static inline const fpnum_t frac_to_fp(unsigned int n, unsigned int d)
+{
+	return (((fpnum_t)n) << FIXEDPOINT_SHIFT) / d;
+}
+
+static inline const fpnum_t double_to_fp(double n)
+{
+	return (n * (1ULL << FIXEDPOINT_SHIFT));
+}
+
+static inline const double fp_to_double(fpnum_t n)
+{
+	return ((double)n) / (1ULL << FIXEDPOINT_SHIFT);
+}
+
+#define FP_ONE frac_to_fp(1, 1)
+
+#endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/27] bisect: Allow specifying desired result confidence
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (11 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 12/27] bisect: Accept and store confidence with each decision Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 14/27] bisect: Use void * for commit_weight Jan Kara
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Allow specifying desired result confidence for stochastic bisection
when starting bisection. Store it and load it when doing bisection
steps.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c                 | 15 ++++++++++++++-
 builtin/bisect--helper.c | 22 +++++++++++++++++++++-
 fixedpoint.h             |  1 +
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 0773a872c82b..f87753d0c67c 100644
--- a/bisect.c
+++ b/bisect.c
@@ -22,6 +22,8 @@ static struct oid_array good_revs;
 static struct oid_array ptest_revs;
 static struct oid_array skipped_revs;
 
+static fpnum_t result_confidence;
+
 static struct object_id *current_bad_oid;
 
 static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
@@ -482,15 +484,25 @@ static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_first_parent, "BISECT_FIRST_PARENT")
 static GIT_PATH_FUNC(git_path_bisect_confidences, "BISECT_CONFIDENCES")
+static GIT_PATH_FUNC(git_path_bisect_result_confidence, "BISECT_RESULT_CONFIDENCE")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 
 static void read_bisect_confidences(void)
 {
 	struct strbuf str = STRBUF_INIT;
-	const char *filename = git_path_bisect_confidences();
+	const char *filename = git_path_bisect_result_confidence();
 	FILE *fp = fopen(filename, "r");
 
 	/* Just a regular bisection? */
+	if (!fp)
+		return;
+	if (fscanf(fp, "%"FPNUM_FMT, &result_confidence) != 1)
+		die(_("Cannot parse result confidence in file '%s'"), filename);
+	fclose(fp);
+
+	/* No uncertain bisection steps yet? */
+	filename = git_path_bisect_confidences();
+	fp = fopen(filename, "r");
 	if (!fp)
 		return;
 
@@ -1223,6 +1235,7 @@ int bisect_clean_state(void)
 	unlink_or_warn(git_path_bisect_terms());
 	unlink_or_warn(git_path_bisect_first_parent());
 	unlink_or_warn(git_path_bisect_confidences());
+	unlink_or_warn(git_path_bisect_result_confidence());
 	/* Cleanup head-name if it got left by an old version of git-bisect */
 	unlink_or_warn(git_path_head_name());
 	/*
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index f88feb8da949..5b46a8ca3fd9 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -21,12 +21,14 @@ static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
 static GIT_PATH_FUNC(git_path_bisect_first_parent, "BISECT_FIRST_PARENT")
 static GIT_PATH_FUNC(git_path_bisect_run, "BISECT_RUN")
 static GIT_PATH_FUNC(git_path_bisect_confidences, "BISECT_CONFIDENCES")
+static GIT_PATH_FUNC(git_path_bisect_result_confidence, "BISECT_RESULT_CONFIDENCE")
 
 static const char * const git_bisect_helper_usage[] = {
 	N_("git bisect--helper --bisect-reset [<commit>]"),
 	N_("git bisect--helper --bisect-terms [--term-good | --term-old | --term-bad | --term-new]"),
 	N_("git bisect--helper --bisect-start [--term-{new,bad}=<term> --term-{old,good}=<term>]"
-					    " [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<paths>...]"),
+					    " [--no-checkout] [--first-parent] [--confidence <conf>]"
+					    " [<bad> [<good>...]] [--] [<paths>...]"),
 	N_("git bisect--helper --bisect-next"),
 	N_("git bisect--helper --bisect-state (bad|new) [--confidence <conf>] [<rev>]"),
 	N_("git bisect--helper --bisect-state (good|old) [--confidence <conf>] [<rev>...]"),
@@ -668,6 +670,7 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
 	struct object_id head_oid;
 	struct object_id oid;
 	const char *head;
+	fpnum_t confidence = FP_ONE;
 
 	if (is_bare_repository())
 		no_checkout = 1;
@@ -690,6 +693,16 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
 			no_checkout = 1;
 		} else if (!strcmp(arg, "--first-parent")) {
 			first_parent_only = 1;
+		} else if (!strcmp(arg, "--confidence")) {
+			i++;
+			if (argc <= i)
+				return error(_("missing confidence argument"));
+			if (parse_confidence(argv[i], &confidence))
+				return -1;
+			if (confidence == FP_ONE)
+				return error(_("Absolute confidence not possible with stochastic bisection"));
+			if (confidence < FP_HALF)
+				return error(_("Target confidence of at least 0.5 needed for stochastic bisection"));
 		} else if (!strcmp(arg, "--term-good") ||
 			 !strcmp(arg, "--term-old")) {
 			i++;
@@ -809,6 +822,10 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
 	if (first_parent_only)
 		write_file(git_path_bisect_first_parent(), "\n");
 
+	if (confidence != FP_ONE)
+		write_file(git_path_bisect_result_confidence(),
+			   "%" FPNUM_FMT "\n", confidence);
+
 	if (no_checkout) {
 		if (get_oid(start_head.buf, &oid) < 0) {
 			res = error(_("invalid ref: '%s'"), start_head.buf);
@@ -917,6 +934,9 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a
 			return error(_("missing confidence argument"));
 		if (parse_confidence(argv[1], &confidence))
 			return -1;
+		if (is_empty_or_missing_file(git_path_bisect_result_confidence()))
+			return error(_("Stochastic bisection not started. Pass "
+				"desired target confidence to git bisect start."));
 		argv += 2;
 		argc -= 2;
 	}
diff --git a/fixedpoint.h b/fixedpoint.h
index addef223be2b..3f6234c6530a 100644
--- a/fixedpoint.h
+++ b/fixedpoint.h
@@ -25,5 +25,6 @@ static inline const double fp_to_double(fpnum_t n)
 }
 
 #define FP_ONE frac_to_fp(1, 1)
+#define FP_HALF frac_to_fp(1, 2)
 
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/27] bisect: Use void * for commit_weight
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (12 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 13/27] bisect: Allow specifying desired result confidence Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 15/27] bisect: Rename clear_distance() to clear_counted_flag() Jan Kara
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

For stochastic bisection we will need to store more information per
commit. Make commit_weight slab store void * instead of int * so that we
can reuse it for stochastic bisection as well.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/bisect.c b/bisect.c
index f87753d0c67c..7416a57db4e3 100644
--- a/bisect.c
+++ b/bisect.c
@@ -77,19 +77,25 @@ static void clear_distance(struct commit_list *list)
 	}
 }
 
-define_commit_slab(commit_weight, int *);
+define_commit_slab(commit_weight, void *);
 static struct commit_weight commit_weight;
 
 #define DEBUG_BISECT 0
 
+static inline int has_weight(struct commit_list *elem)
+{
+	return commit_weight.slab_size > 0 &&
+	       *commit_weight_at(&commit_weight, elem->item) != NULL;
+}
+
 static inline int weight(struct commit_list *elem)
 {
-	return **commit_weight_at(&commit_weight, elem->item);
+	return *(int *)*commit_weight_at(&commit_weight, elem->item);
 }
 
 static inline void weight_set(struct commit_list *elem, int weight)
 {
-	**commit_weight_at(&commit_weight, elem->item) = weight;
+	*(int *)*commit_weight_at(&commit_weight, elem->item) = weight;
 }
 
 static int count_interesting_parents(struct commit *commit, unsigned bisect_flags)
@@ -163,7 +169,7 @@ static void show_list(const char *debug, int counted, int nr,
 			(commit_flags & TREESAME) ? ' ' : 'T',
 			(commit_flags & UNINTERESTING) ? 'U' : ' ',
 			(commit_flags & COUNTED) ? 'C' : ' ');
-		if (*commit_weight_at(&commit_weight, p->item))
+		if (has_weight(p))
 			fprintf(stderr, "%3d", weight(p));
 		else
 			fprintf(stderr, "---");
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/27] bisect: Rename clear_distance() to clear_counted_flag()
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (13 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 14/27] bisect: Use void * for commit_weight Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 16/27] bisect: Separate commit list reversal Jan Kara
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

clear_distance() only clears COUNTED flag. Rename the function to match
what it does. No code changes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 7416a57db4e3..675e8d433760 100644
--- a/bisect.c
+++ b/bisect.c
@@ -68,7 +68,7 @@ static int count_distance(struct commit_list *entry)
 	return nr;
 }
 
-static void clear_distance(struct commit_list *list)
+static void clear_counted_flag(struct commit_list *list)
 {
 	while (list) {
 		struct commit *commit = list->item;
@@ -339,7 +339,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		if (bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)
 			BUG("shouldn't be calling count-distance in fp mode");
 		weight_set(p, count_distance(p));
-		clear_distance(list);
+		clear_counted_flag(list);
 
 		/* Does it happen to be at half-way? */
 		if (!(bisect_flags & FIND_BISECTION_ALL) &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 16/27] bisect: Separate commit list reversal
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (14 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 15/27] bisect: Rename clear_distance() to clear_counted_flag() Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 17/27] bisect: Allow more complex commit weights Jan Kara
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Item list reversal is part of code counting number of list items
changing tree. Move it into the separate function and do the reversal
after counting. The stochastic bisection will need to do more operations
after the counting but before reversing the list.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/bisect.c b/bisect.c
index 675e8d433760..8dc1eb7f9d82 100644
--- a/bisect.c
+++ b/bisect.c
@@ -398,11 +398,24 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		return best_bisection_sorted(list, nr);
 }
 
+
+static struct commit_list *reverse_list(struct commit_list *list)
+{
+	struct commit_list *p, *next, *last = NULL;
+
+	for (p = list; p; p = next) {
+		next = p->next;
+		p->next = last;
+		last = p;
+	}
+	return last;
+}
+
 void find_bisection(struct commit_list **commit_list, int *reaches,
 		    int *all, unsigned bisect_flags)
 {
 	int nr, on_list;
-	struct commit_list *list, *p, *best, *next, *last;
+	struct commit_list *list, *p, *best, *next, **pnext;
 	int *weights;
 
 	init_commit_weight(&commit_weight);
@@ -410,25 +423,25 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 
 	/*
 	 * Count the number of total and tree-changing items on the
-	 * list, while reversing the list.
+	 * list and trim uninteresting items from the list.
 	 */
-	for (nr = on_list = 0, last = NULL, p = *commit_list;
-	     p;
-	     p = next) {
+	list = *commit_list;
+	pnext = &list;
+	for (nr = on_list = 0, p = list; p; p = next) {
 		unsigned commit_flags = p->item->object.flags;
 
 		next = p->next;
 		if (commit_flags & UNINTERESTING) {
+			*pnext = next;
 			free(p);
 			continue;
 		}
-		p->next = last;
-		last = p;
+		pnext = &p->next;
 		if (!(commit_flags & TREESAME))
 			nr++;
 		on_list++;
 	}
-	list = last;
+	list = reverse_list(list);
 	show_list("bisection 2 sorted", 0, nr, list);
 
 	*all = nr;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 17/27] bisect: Allow more complex commit weights
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (15 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 16/27] bisect: Separate commit list reversal Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 18/27] bisect: Terminate early if there are no eligible commits Jan Kara
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

For stochastic bisection we will need to keep more information for each
commit than plain int. Factor out initialization of commit weight
storage into a separate function so that stochastic bisection can more
easily alter it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/bisect.c b/bisect.c
index 8dc1eb7f9d82..dd2f6b68ae3d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -280,19 +280,17 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
  * or positive distance.
  */
 static struct commit_list *do_find_bisection(struct commit_list *list,
-					     int nr, int *weights,
-					     unsigned bisect_flags)
+					     int nr, unsigned bisect_flags)
 {
-	int n, counted;
+	int counted;
 	struct commit_list *p;
 
 	counted = 0;
 
-	for (n = 0, p = list; p; p = p->next) {
+	for (p = list; p; p = p->next) {
 		struct commit *commit = p->item;
 		unsigned commit_flags = commit->object.flags;
 
-		*commit_weight_at(&commit_weight, p->item) = &weights[n++];
 		switch (count_interesting_parents(commit, bisect_flags)) {
 		case 0:
 			if (!(commit_flags & TREESAME)) {
@@ -398,6 +396,20 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		return best_bisection_sorted(list, nr);
 }
 
+static void *setup_commit_weight_array(struct commit_list *list, int nodes)
+{
+	int entry_size = sizeof(int);
+	void *array;
+	struct commit_list *p;
+	int n;
+
+	array = xcalloc(nodes, entry_size);
+	for (n = 0, p = list; p; p = p->next, n++) {
+		*commit_weight_at(&commit_weight, p->item) =
+						array + n * entry_size;
+	}
+	return array;
+}
 
 static struct commit_list *reverse_list(struct commit_list *list)
 {
@@ -416,7 +428,7 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 {
 	int nr, on_list;
 	struct commit_list *list, *p, *best, *next, **pnext;
-	int *weights;
+	void *weights;
 
 	init_commit_weight(&commit_weight);
 	show_list("bisection 2 entry", 0, 0, *commit_list);
@@ -441,14 +453,14 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 			nr++;
 		on_list++;
 	}
+	weights = setup_commit_weight_array(list, on_list);
 	list = reverse_list(list);
 	show_list("bisection 2 sorted", 0, nr, list);
 
 	*all = nr;
-	CALLOC_ARRAY(weights, on_list);
 
 	/* Do the real work of finding bisection commit. */
-	best = do_find_bisection(list, nr, weights, bisect_flags);
+	best = do_find_bisection(list, nr, bisect_flags);
 	if (best) {
 		if (!(bisect_flags & FIND_BISECTION_ALL)) {
 			list->item = best->item;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 18/27] bisect: Terminate early if there are no eligible commits
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (16 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 17/27] bisect: Allow more complex commit weights Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 19/27] bisect: Compute reachability of tested revs Jan Kara
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

If there are no commits that can be valid bisection point, just
terminate search for bisection point early. Firstly, it just wastes
time, secondly with more complex computations for stochastic bisection
we would have to deal with this special corner-case unnecessarily.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index dd2f6b68ae3d..3a26255e8650 100644
--- a/bisect.c
+++ b/bisect.c
@@ -453,12 +453,15 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 			nr++;
 		on_list++;
 	}
+	*all = nr;
+	if (!nr) {
+		*reaches = 0;
+		goto out_weights;
+	}
 	weights = setup_commit_weight_array(list, on_list);
 	list = reverse_list(list);
 	show_list("bisection 2 sorted", 0, nr, list);
 
-	*all = nr;
-
 	/* Do the real work of finding bisection commit. */
 	best = do_find_bisection(list, nr, bisect_flags);
 	if (best) {
@@ -472,6 +475,7 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 	}
 	free(weights);
 	*commit_list = best;
+out_weights:
 	clear_commit_weight(&commit_weight);
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 19/27] bisect: Compute reachability of tested revs
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (17 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 18/27] bisect: Terminate early if there are no eligible commits Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 20/27] bisect: Compute probability a particular commit is bad Jan Kara
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Compute for each rev a bitmap of revs that can reach it and that were
tested during stochastic bisection run.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c     | 199 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 fixedpoint.h |  18 +++++
 2 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 3a26255e8650..cf11926d6f4e 100644
--- a/bisect.c
+++ b/bisect.c
@@ -17,6 +17,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "fixedpoint.h"
+#include "hashmap.h"
 
 static struct oid_array good_revs;
 static struct oid_array ptest_revs;
@@ -24,6 +25,16 @@ static struct oid_array skipped_revs;
 
 static fpnum_t result_confidence;
 
+struct tested_rev {
+	struct hashmap_entry entry;
+	struct object_id *oid;
+	int index;		/* Index in ptest_revs array */
+	fpnum_t confidence;	/* Total confidence the rev is good computed
+				 * from of all tests on this rev */
+};
+
+static struct hashmap tested_revs_map;
+
 static struct object_id *current_bad_oid;
 
 static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
@@ -80,8 +91,26 @@ static void clear_counted_flag(struct commit_list *list)
 define_commit_slab(commit_weight, void *);
 static struct commit_weight commit_weight;
 
+/*
+ * Information associated with each commit when computing stochastic bisection
+ * weights.
+ */
+struct st_weight {
+	fpnum_t weight;		/* Total weight of all reachable commits */
+	fpnum_t node_weight;	/* Weight of this particular commit */
+	unsigned long rev_bitmap[];	/* Bitmap of tested commits reaching
+					 * this commit */
+};
+int sw_rev_bmp_longs;
+
 #define DEBUG_BISECT 0
 
+#if DEBUG_BISECT > 0
+#define debug_bisect(fmt...) fprintf(stderr, fmt)
+#else
+#define debug_bisect(fmt...) do { } while (0)
+#endif
+
 static inline int has_weight(struct commit_list *elem)
 {
 	return commit_weight.slab_size > 0 &&
@@ -98,6 +127,37 @@ static inline void weight_set(struct commit_list *elem, int weight)
 	*(int *)*commit_weight_at(&commit_weight, elem->item) = weight;
 }
 
+#define BITS_PER_LONG (sizeof(long) * 8)
+
+/*
+ * Add bits in 'from' reachability bitmap to 'to'. Returns 1 if any bit in 'to'
+ * was set.
+ */
+static int sw_rev_bmp_or(struct st_weight *to, struct st_weight *from)
+{
+	int i;
+	int set = 0;
+
+	for (i = 0; i < sw_rev_bmp_longs; i++) {
+		unsigned long prev = to->rev_bitmap[i];
+
+		to->rev_bitmap[i] |= from->rev_bitmap[i];
+		set |= (to->rev_bitmap[i] != prev);
+	}
+	return set;
+}
+
+static void sw_rev_bmp_set(struct st_weight *to, int idx)
+{
+	to->rev_bitmap[idx / BITS_PER_LONG] |= 1UL << (idx % BITS_PER_LONG);
+}
+
+static unsigned long sw_rev_bmp_test(struct st_weight *to, int idx)
+{
+	return to->rev_bitmap[idx / BITS_PER_LONG] &
+					(1UL << (idx % BITS_PER_LONG));
+}
+
 static int count_interesting_parents(struct commit *commit, unsigned bisect_flags)
 {
 	struct commit_list *p;
@@ -403,6 +463,14 @@ static void *setup_commit_weight_array(struct commit_list *list, int nodes)
 	struct commit_list *p;
 	int n;
 
+	/* Stochastic bisection? */
+	if (result_confidence) {
+		int revs = ptest_revs.nr + 1;
+
+		sw_rev_bmp_longs = (revs + BITS_PER_LONG - 1) / BITS_PER_LONG;
+		entry_size = sizeof(struct st_weight) +
+					sw_rev_bmp_longs * sizeof(long);
+	}
 	array = xcalloc(nodes, entry_size);
 	for (n = 0, p = list; p; p = p->next, n++) {
 		*commit_weight_at(&commit_weight, p->item) =
@@ -411,6 +479,91 @@ static void *setup_commit_weight_array(struct commit_list *list, int nodes)
 	return array;
 }
 
+static struct tested_rev *lookup_tested_oid(struct object_id *oid)
+{
+	struct tested_rev key;
+
+	hashmap_entry_init(&key.entry, oidhash(oid));
+	key.oid = oid;
+	return hashmap_get_entry(&tested_revs_map, &key, entry, NULL);
+}
+
+static int sw_rev_test_idx(struct commit *commit)
+{
+	struct tested_rev *tr;
+
+	tr = lookup_tested_oid(&commit->object.oid);
+	if (!tr)
+		return -1;
+	return tr->index;
+}
+
+static int propagate_to_ancestors(struct commit *commit)
+{
+	struct commit_list *q;
+	int out_of_order = 0, set;
+	struct st_weight *cw;
+
+	cw = *commit_weight_at(&commit_weight, commit);
+	for (q = commit->parents; q; q = q->next) {
+		if (q->item->object.flags & UNINTERESTING)
+			continue;
+		set = sw_rev_bmp_or(*commit_weight_at(&commit_weight, q->item),
+				    cw);
+		if (set && q->item->object.flags & COUNTED) {
+			debug_bisect("descendants_bitmap: Recursion for %s\n",
+				     oid_to_hex(&q->item->object.oid));
+			/*
+			 * If the commit is already processed, we need to
+			 * propagate bitmap update to all already processed
+			 * ancestors. The 'list' should be close to inverse
+			 * topological order so this should be rare.
+			 */
+			out_of_order = 1;
+		}
+	}
+	return out_of_order;
+}
+
+/*
+ * For each commit in the 'list' compute bitmap of tested revisions that can
+ * reach it. Note for our purposes each commit can reach itself.
+ */
+static void compute_tested_descendants(struct commit_list *list)
+{
+	struct commit_list *p;
+	int retry = 0, tried = 1;
+
+	for (p = list; p; p = p->next) {
+		struct commit *commit = p->item;
+		struct st_weight *cw;
+		int idx;
+
+		cw = *commit_weight_at(&commit_weight, commit);
+		idx = sw_rev_test_idx(commit);
+		if (idx >= 0)
+			sw_rev_bmp_set(cw, idx);
+		retry = propagate_to_ancestors(commit);
+		commit->object.flags |= COUNTED;
+	}
+	clear_counted_flag(list);
+	/*
+	 * We expect the 'list' to be close to reverse topological order. If
+	 * it is not exactly in reverse topological order, we need to repeat
+	 * descendant bit propagation until bitmaps are stable.
+	 */
+	while (retry) {
+		tried++;
+		retry = 0;
+		for (p = list; p; p = p->next) {
+			retry = propagate_to_ancestors(p->item);
+			p->item->object.flags |= COUNTED;
+		}
+		clear_counted_flag(list);
+	}
+	debug_bisect("%s: Took %d iterations to stabilize.\n", __func__, tried);
+}
+
 static struct commit_list *reverse_list(struct commit_list *list)
 {
 	struct commit_list *p, *next, *last = NULL;
@@ -459,6 +612,8 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 		goto out_weights;
 	}
 	weights = setup_commit_weight_array(list, on_list);
+	if (result_confidence)
+		compute_tested_descendants(list);
 	list = reverse_list(list);
 	show_list("bisection 2 sorted", 0, nr, list);
 
@@ -522,11 +677,39 @@ static GIT_PATH_FUNC(git_path_bisect_confidences, "BISECT_CONFIDENCES")
 static GIT_PATH_FUNC(git_path_bisect_result_confidence, "BISECT_RESULT_CONFIDENCE")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 
+static int tested_rev_cmp(const void *data, const struct hashmap_entry *ap,
+			  const struct hashmap_entry *bp, const void *keydata)
+{
+	const struct tested_rev *a, *b;
+
+	a = container_of(ap, const struct tested_rev, entry);
+	b = container_of(bp, const struct tested_rev, entry);
+
+	return oidcmp(a->oid, b->oid);
+}
+
+static void setup_tested_revs_map(void)
+{
+	int i;
+	struct tested_rev *tr;
+
+	hashmap_init(&tested_revs_map, tested_rev_cmp, NULL, ptest_revs.nr + 1);
+	for (i = 0; i < ptest_revs.nr; i++) {
+		tr = xmalloc(sizeof(struct tested_rev));
+		hashmap_entry_init(&tr->entry, oidhash(ptest_revs.oid + i));
+		tr->oid = ptest_revs.oid + i;
+		tr->index = i;
+		tr->confidence = FP_HALF;
+		hashmap_add(&tested_revs_map, &tr->entry);
+	}
+}
+
 static void read_bisect_confidences(void)
 {
 	struct strbuf str = STRBUF_INIT;
 	const char *filename = git_path_bisect_result_confidence();
 	FILE *fp = fopen(filename, "r");
+	struct tested_rev *trev;
 
 	/* Just a regular bisection? */
 	if (!fp)
@@ -535,6 +718,8 @@ static void read_bisect_confidences(void)
 		die(_("Cannot parse result confidence in file '%s'"), filename);
 	fclose(fp);
 
+	setup_tested_revs_map();
+
 	/* No uncertain bisection steps yet? */
 	filename = git_path_bisect_confidences();
 	fp = fopen(filename, "r");
@@ -542,7 +727,7 @@ static void read_bisect_confidences(void)
 		return;
 
 	while (strbuf_getline_lf(&str, fp) != EOF) {
-		fpnum_t prob;
+		fpnum_t prob, pos_prob, neg_prob;
 		char *spc;
 		char state;
 		struct object_id oid;
@@ -562,7 +747,17 @@ static void read_bisect_confidences(void)
 		if (get_oid(str.buf, &oid))
 			die(_("Cannot get oid of rev '%s' from file '%s'"),
 			    str.buf, filename);
-		/* We'll use parsed data later */
+		trev = lookup_tested_oid(&oid);
+		if (!trev)
+			die(_("Rev '%s' not tracked as ptest rev."), str.buf);
+		if (state == 'b')
+			prob = FP_ONE - prob;
+		pos_prob = fp_mul(trev->confidence, prob);
+		neg_prob = fp_mul(FP_ONE - trev->confidence, FP_ONE - prob);
+		trev->confidence = fp_div(pos_prob, pos_prob + neg_prob);
+		debug_bisect("read confidence %s: %"FPNUM_FMT
+			" (total %"FPNUM_FMT")\n",
+			str.buf, prob, trev->confidence);
 	}
 	strbuf_release(&str);
 	fclose(fp);
diff --git a/fixedpoint.h b/fixedpoint.h
index 3f6234c6530a..159bb6ef4358 100644
--- a/fixedpoint.h
+++ b/fixedpoint.h
@@ -27,4 +27,22 @@ static inline const double fp_to_double(fpnum_t n)
 #define FP_ONE frac_to_fp(1, 1)
 #define FP_HALF frac_to_fp(1, 2)
 
+/* Multiplication for numbers <= 1 */
+static inline const fpnum_t fp_mul(fpnum_t n, fpnum_t m)
+{
+	if (m == FP_ONE)
+		return n;
+	if (n == FP_ONE)
+		return m;
+	assert(m < FP_ONE && n < FP_ONE);
+	return (m * n) >> FIXEDPOINT_SHIFT;
+}
+
+/* Division for number <= 1 */
+static inline const fpnum_t fp_div(fpnum_t n, fpnum_t d)
+{
+	assert(n <= FP_ONE);
+	return (n << (FIXEDPOINT_SHIFT - 1)) / (d >> 1);
+}
+
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 20/27] bisect: Compute probability a particular commit is bad
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (18 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 19/27] bisect: Compute reachability of tested revs Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 21/27] bisect: Reorganize commit weight computation Jan Kara
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Compute conditional probability a commit is bad given results of tests
performed so far, for each commit in commit list.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/bisect.c b/bisect.c
index cf11926d6f4e..680b96654fd4 100644
--- a/bisect.c
+++ b/bisect.c
@@ -576,6 +576,128 @@ static struct commit_list *reverse_list(struct commit_list *list)
 	return last;
 }
 
+struct sw_rev_bmp_hash_entry {
+	struct hashmap_entry entry;
+	struct st_weight *commit_weight;
+	fpnum_t cluster_p_bad;
+	unsigned int count;
+};
+
+static int sw_rev_bmp_cmp(const void *data, const struct hashmap_entry *ap,
+			  const struct hashmap_entry *bp, const void *keydata)
+{
+	int i;
+	const struct sw_rev_bmp_hash_entry *a, *b;
+
+	a = container_of(ap, const struct sw_rev_bmp_hash_entry, entry);
+	b = container_of(bp, const struct sw_rev_bmp_hash_entry, entry);
+
+	for (i = 0; i < sw_rev_bmp_longs; i++)
+		if (a->commit_weight->rev_bitmap[i] !=
+		    b->commit_weight->rev_bitmap[i])
+			return 1;
+	return 0;
+}
+
+/*
+ * Compute for each commit a probability it is the bad one given tests
+ * performed so far.
+ */
+static void compute_commit_weights(struct commit_list *list)
+{
+	struct commit_list *p;
+	struct hashmap reach_map;
+	struct sw_rev_bmp_hash_entry *found_entry, entry;
+	struct hashmap_iter reach_iter, trev_iter;
+	fpnum_t cluster_prob_sum = 0;
+
+	/*
+	 * We call a "cluster" a set of commits which have identical set of
+	 * tested revisions that can reach to it. We compute the size of each
+	 * cluster here - we count only tree-changing nodes as only those can
+	 * be the bad ones.
+	 */
+	hashmap_init(&reach_map, sw_rev_bmp_cmp, NULL, ptest_revs.nr + 1);
+	for (p = list; p; p = p->next) {
+		struct st_weight *pweight;
+		unsigned int hashval;
+
+		if (p->item->object.flags & TREESAME)
+			continue;
+
+		pweight = *commit_weight_at(&commit_weight, p->item);
+		hashval = memhash(pweight->rev_bitmap,
+				sw_rev_bmp_longs * sizeof(unsigned long));
+		hashmap_entry_init(&entry.entry, hashval);
+		entry.commit_weight = pweight;
+		found_entry = hashmap_get_entry(&reach_map, &entry, entry, NULL);
+		if (!found_entry) {
+			found_entry = xmalloc(
+					sizeof(struct sw_rev_bmp_hash_entry));
+			hashmap_entry_init(&found_entry->entry, hashval);
+			found_entry->commit_weight = pweight;
+			found_entry->count = 0;
+			hashmap_add(&reach_map, &found_entry->entry);
+		}
+		found_entry->count++;
+	}
+
+	/*
+	 * Compute probability bad commit is in a particular cluster. The
+	 * probability is:
+	 * P(error in cluster C) =
+	 *   \Pi_{i\in 'tested rev not reaching C'} P(test at i good) *
+	 *   \Pi_{i\in 'tested rev reaching C'} P(test at i bad)
+	 */
+	hashmap_for_each_entry(&reach_map, &reach_iter, found_entry, entry) {
+		fpnum_t cluster_prob = FP_ONE;
+		struct tested_rev *trev;
+
+		hashmap_for_each_entry(&tested_revs_map, &trev_iter, trev,
+				       entry) {
+			if (sw_rev_bmp_test(found_entry->commit_weight,
+					    trev->index)) {
+				cluster_prob = fp_mul(cluster_prob,
+						FP_ONE - trev->confidence);
+			} else {
+				cluster_prob = fp_mul(cluster_prob,
+						trev->confidence);
+			}
+		}
+		found_entry->cluster_p_bad = cluster_prob;
+		cluster_prob_sum += cluster_prob;
+	}
+	/*
+	 * Normalize the probabilities to sum to 1 - we need this normalization
+	 * because in fact we compute conditional probability of bad commit
+	 * being in a particular cluster given test results we already
+	 * obtained.
+	 */
+	hashmap_for_each_entry(&reach_map, &reach_iter, found_entry, entry) {
+		found_entry->cluster_p_bad = fp_div(found_entry->cluster_p_bad,
+						    cluster_prob_sum);
+	}
+
+	/* Uniformly distribute the probability among all nodes of a cluster */
+	for (p = list; p; p = p->next) {
+		struct st_weight *pweight;
+
+		if (p->item->object.flags & TREESAME)
+			continue;
+
+		pweight = *commit_weight_at(&commit_weight, p->item);
+		hashmap_entry_init(&entry.entry,
+			memhash(pweight->rev_bitmap,
+				sw_rev_bmp_longs * sizeof(unsigned long)));
+		entry.commit_weight = pweight;
+		found_entry = hashmap_get_entry(&reach_map, &entry, entry, NULL);
+		pweight->node_weight =
+			found_entry->cluster_p_bad / found_entry->count;
+	}
+
+	hashmap_clear_and_free(&reach_map, struct sw_rev_bmp_hash_entry, entry);
+}
+
 void find_bisection(struct commit_list **commit_list, int *reaches,
 		    int *all, unsigned bisect_flags)
 {
@@ -615,6 +737,8 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 	if (result_confidence)
 		compute_tested_descendants(list);
 	list = reverse_list(list);
+	if (result_confidence)
+		compute_commit_weights(list);
 	show_list("bisection 2 sorted", 0, nr, list);
 
 	/* Do the real work of finding bisection commit. */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 21/27] bisect: Reorganize commit weight computation
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (19 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 20/27] bisect: Compute probability a particular commit is bad Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 22/27] bisect: Move count_distance() Jan Kara
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Reorganize commit weight computation a bit so that it is easier to
generalize for stochastic bisection. There's no real need for two
special values (-1 and -2). Also we can set weight of leaf nodes while
computing weight of nodes with one parent. Overall the code becomes a
bit simpler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 93 +++++++++++++++++++++-----------------------------------
 1 file changed, 34 insertions(+), 59 deletions(-)

diff --git a/bisect.c b/bisect.c
index 680b96654fd4..4107161c086c 100644
--- a/bisect.c
+++ b/bisect.c
@@ -158,17 +158,14 @@ static unsigned long sw_rev_bmp_test(struct st_weight *to, int idx)
 					(1UL << (idx % BITS_PER_LONG));
 }
 
-static int count_interesting_parents(struct commit *commit, unsigned bisect_flags)
+static int count_interesting_parents(struct commit *commit)
 {
 	struct commit_list *p;
 	int count;
 
-	for (count = 0, p = commit->parents; p; p = p->next) {
+	for (count = 0, p = commit->parents; p; p = p->next)
 		if (!(p->item->object.flags & UNINTERESTING))
 			count++;
-		if (bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)
-			break;
-	}
 	return count;
 }
 
@@ -326,18 +323,14 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 	return list;
 }
 
+#define WEIGHT_UNSET -1
+
 /*
- * zero or positive weight is the number of interesting commits it can
+ * Zero or positive weight is the number of interesting commits it can
  * reach, including itself.  Especially, weight = 0 means it does not
  * reach any tree-changing commits (e.g. just above uninteresting one
- * but traversal is with pathspec).
- *
- * weight = -1 means it has one parent and its distance is yet to
- * be computed.
- *
- * weight = -2 means it has more than one parent and its distance is
- * unknown.  After running count_distance() first, they will get zero
- * or positive distance.
+ * but traversal is with pathspec). We initialize weights to a special value
+ * WEIGHT_UNSET to identify commits for which we didn't compute weight yet.
  */
 static struct commit_list *do_find_bisection(struct commit_list *list,
 					     int nr, unsigned bisect_flags)
@@ -346,32 +339,8 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 	struct commit_list *p;
 
 	counted = 0;
-
-	for (p = list; p; p = p->next) {
-		struct commit *commit = p->item;
-		unsigned commit_flags = commit->object.flags;
-
-		switch (count_interesting_parents(commit, bisect_flags)) {
-		case 0:
-			if (!(commit_flags & TREESAME)) {
-				weight_set(p, 1);
-				counted++;
-				show_list("bisection 2 count one",
-					  counted, nr, list);
-			}
-			/*
-			 * otherwise, it is known not to reach any
-			 * tree-changing commit and gets weight 0.
-			 */
-			break;
-		case 1:
-			weight_set(p, -1);
-			break;
-		default:
-			weight_set(p, -2);
-			break;
-		}
-	}
+	for (p = list; p; p = p->next)
+		weight_set(p, WEIGHT_UNSET);
 
 	show_list("bisection 2 initialize", counted, nr, list);
 
@@ -389,21 +358,21 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 	 * So we will first count distance of merges the usual
 	 * way, and then fill the blanks using cheaper algorithm.
 	 */
-	for (p = list; p; p = p->next) {
-		if (p->item->object.flags & UNINTERESTING)
-			continue;
-		if (weight(p) != -2)
-			continue;
-		if (bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)
-			BUG("shouldn't be calling count-distance in fp mode");
-		weight_set(p, count_distance(p));
-		clear_counted_flag(list);
+	if (!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY)) {
+		for (p = list; p; p = p->next) {
+			if (p->item->object.flags & UNINTERESTING)
+				continue;
+			if (count_interesting_parents(p->item) <= 1)
+				continue;
+			weight_set(p, count_distance(p));
+			clear_counted_flag(list);
 
-		/* Does it happen to be at half-way? */
-		if (!(bisect_flags & FIND_BISECTION_ALL) &&
-		      approx_halfway(p, nr))
-			return p;
-		counted++;
+			/* Does it happen to be at half-way? */
+			if (!(bisect_flags & FIND_BISECTION_ALL) &&
+			      approx_halfway(p, nr))
+				return p;
+			counted++;
+		}
 	}
 
 	show_list("bisection 2 count_distance", counted, nr, list);
@@ -412,8 +381,9 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		for (p = list; p; p = p->next) {
 			struct commit_list *q;
 			unsigned commit_flags = p->item->object.flags;
+			int parent_weight = 0;
 
-			if (0 <= weight(p))
+			if (weight(p) != WEIGHT_UNSET)
 				continue;
 
 			for (q = p->item->parents;
@@ -421,10 +391,15 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 			     q = bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY ? NULL : q->next) {
 				if (q->item->object.flags & UNINTERESTING)
 					continue;
-				if (0 <= weight(q))
+				parent_weight = weight(q);
+				if (parent_weight != WEIGHT_UNSET)
 					break;
 			}
-			if (!q)
+			/*
+			 * Only parent with unset weight? We need to compute
+			 * other weights first.
+			 */
+			if (parent_weight == WEIGHT_UNSET)
 				continue;
 
 			/*
@@ -433,13 +408,13 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 			 * otherwise inherit it from q directly.
 			 */
 			if (!(commit_flags & TREESAME)) {
-				weight_set(p, weight(q)+1);
+				weight_set(p, parent_weight + 1);
 				counted++;
 				show_list("bisection 2 count one",
 					  counted, nr, list);
 			}
 			else
-				weight_set(p, weight(q));
+				weight_set(p, parent_weight);
 
 			/* Does it happen to be at half-way? */
 			if (!(bisect_flags & FIND_BISECTION_ALL) &&
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 22/27] bisect: Move count_distance()
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (20 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 21/27] bisect: Reorganize commit weight computation Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 23/27] bisect: Find bisection point for stochastic weights Jan Kara
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Move count_distance() function as stochastic bisection will need to
call weight() from it and the move avoids forward declaration. No code
changes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 92 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 46 insertions(+), 46 deletions(-)

diff --git a/bisect.c b/bisect.c
index 4107161c086c..8e47c3fb4b9e 100644
--- a/bisect.c
+++ b/bisect.c
@@ -42,52 +42,6 @@ static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
 static const char *term_bad;
 static const char *term_good;
 
-/* Remember to update object flag allocation in object.h */
-#define COUNTED		(1u<<16)
-
-/*
- * This is a truly stupid algorithm, but it's only
- * used for bisection, and we just don't care enough.
- *
- * We care just barely enough to avoid recursing for
- * non-merge entries.
- */
-static int count_distance(struct commit_list *entry)
-{
-	int nr = 0;
-
-	while (entry) {
-		struct commit *commit = entry->item;
-		struct commit_list *p;
-
-		if (commit->object.flags & (UNINTERESTING | COUNTED))
-			break;
-		if (!(commit->object.flags & TREESAME))
-			nr++;
-		commit->object.flags |= COUNTED;
-		p = commit->parents;
-		entry = p;
-		if (p) {
-			p = p->next;
-			while (p) {
-				nr += count_distance(p);
-				p = p->next;
-			}
-		}
-	}
-
-	return nr;
-}
-
-static void clear_counted_flag(struct commit_list *list)
-{
-	while (list) {
-		struct commit *commit = list->item;
-		commit->object.flags &= ~COUNTED;
-		list = list->next;
-	}
-}
-
 define_commit_slab(commit_weight, void *);
 static struct commit_weight commit_weight;
 
@@ -169,6 +123,52 @@ static int count_interesting_parents(struct commit *commit)
 	return count;
 }
 
+/* Remember to update object flag allocation in object.h */
+#define COUNTED		(1u<<16)
+
+/*
+ * This is a truly stupid algorithm, but it's only
+ * used for bisection, and we just don't care enough.
+ *
+ * We care just barely enough to avoid recursing for
+ * non-merge entries.
+ */
+static int count_distance(struct commit_list *entry)
+{
+	int nr = 0;
+
+	while (entry) {
+		struct commit *commit = entry->item;
+		struct commit_list *p;
+
+		if (commit->object.flags & (UNINTERESTING | COUNTED))
+			break;
+		if (!(commit->object.flags & TREESAME))
+			nr++;
+		commit->object.flags |= COUNTED;
+		p = commit->parents;
+		entry = p;
+		if (p) {
+			p = p->next;
+			while (p) {
+				nr += count_distance(p);
+				p = p->next;
+			}
+		}
+	}
+
+	return nr;
+}
+
+static void clear_counted_flag(struct commit_list *list)
+{
+	while (list) {
+		struct commit *commit = list->item;
+		commit->object.flags &= ~COUNTED;
+		list = list->next;
+	}
+}
+
 static inline int approx_halfway(struct commit_list *p, int nr)
 {
 	int diff;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 23/27] bisect: Find bisection point for stochastic weights
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (21 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 22/27] bisect: Move count_distance() Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 24/27] bisect: Stop bisection when we are confident about bad commit Jan Kara
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

The algorithm for finding the next bisection point for stochastic
bisection is the same as for normal bisection. Just instead of uniform
node weight 1 we use the weights derived from computed fault
probabilities. To allow reusing as much code as possible we change
normal bisection to use fixedpoint number type (uint64_t) for
computations instead of int. It wastes 4 bytes per commit but is
probably worth the avoided duplication. We change the functions deciding
about bisection point to work on weights in [0,1] range and scale normal
bisection weights to be in this range.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c     | 138 ++++++++++++++++++++++++++++++++++-----------------
 fixedpoint.h |  10 ++++
 2 files changed, 102 insertions(+), 46 deletions(-)

diff --git a/bisect.c b/bisect.c
index 8e47c3fb4b9e..12b027b86e75 100644
--- a/bisect.c
+++ b/bisect.c
@@ -42,6 +42,12 @@ static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
 static const char *term_bad;
 static const char *term_good;
 
+/*
+ * Slab for keeping weight information associated with each commit. For normal
+ * bisection we keep just weight of each commit (fpnum_t - fraction of
+ * reachable commits), for stochastic bisection we keep struct st_weight which
+ * contains more information.
+ */
 define_commit_slab(commit_weight, void *);
 static struct commit_weight commit_weight;
 
@@ -50,7 +56,9 @@ static struct commit_weight commit_weight;
  * weights.
  */
 struct st_weight {
-	fpnum_t weight;		/* Total weight of all reachable commits */
+	fpnum_t weight;		/* Total weight of all reachable commits. This
+				 * has to be first in this struct for weight()
+				 * and weight_set() to work correctly. */
 	fpnum_t node_weight;	/* Weight of this particular commit */
 	unsigned long rev_bitmap[];	/* Bitmap of tested commits reaching
 					 * this commit */
@@ -71,14 +79,26 @@ static inline int has_weight(struct commit_list *elem)
 	       *commit_weight_at(&commit_weight, elem->item) != NULL;
 }
 
-static inline int weight(struct commit_list *elem)
+static inline fpnum_t weight(struct commit_list *elem)
+{
+	return *(fpnum_t *)*commit_weight_at(&commit_weight, elem->item);
+}
+
+static inline void weight_set(struct commit_list *elem, fpnum_t weight)
 {
-	return *(int *)*commit_weight_at(&commit_weight, elem->item);
+	*(fpnum_t *)*commit_weight_at(&commit_weight, elem->item) = weight;
 }
 
-static inline void weight_set(struct commit_list *elem, int weight)
+static inline fpnum_t node_weight(struct commit_list *elem)
 {
-	*(int *)*commit_weight_at(&commit_weight, elem->item) = weight;
+	/*
+	 * For normal bisection weight of each node is 1. For stochastic
+	 * bisection we use the weight computed based on test results.
+	 */
+	if (result_confidence)
+		return ((struct st_weight *)*commit_weight_at(&commit_weight,
+					elem->item))->node_weight;
+	return FP_ONE;
 }
 
 #define BITS_PER_LONG (sizeof(long) * 8)
@@ -133,9 +153,9 @@ static int count_interesting_parents(struct commit *commit)
  * We care just barely enough to avoid recursing for
  * non-merge entries.
  */
-static int count_distance(struct commit_list *entry)
+static fpnum_t count_distance(struct commit_list *entry)
 {
-	int nr = 0;
+	fpnum_t weight = 0;
 
 	while (entry) {
 		struct commit *commit = entry->item;
@@ -144,20 +164,20 @@ static int count_distance(struct commit_list *entry)
 		if (commit->object.flags & (UNINTERESTING | COUNTED))
 			break;
 		if (!(commit->object.flags & TREESAME))
-			nr++;
+			weight += node_weight(entry);
 		commit->object.flags |= COUNTED;
 		p = commit->parents;
 		entry = p;
 		if (p) {
 			p = p->next;
 			while (p) {
-				nr += count_distance(p);
+				weight += count_distance(p);
 				p = p->next;
 			}
 		}
 	}
 
-	return nr;
+	return weight;
 }
 
 static void clear_counted_flag(struct commit_list *list)
@@ -169,9 +189,18 @@ static void clear_counted_flag(struct commit_list *list)
 	}
 }
 
+/* Return scale of weights in commit_weight() array */
+static int weight_scale(int nr)
+{
+	if (result_confidence)
+		return 1;
+	return nr;
+}
+
 static inline int approx_halfway(struct commit_list *p, int nr)
 {
-	int diff;
+	fpnum_t diff;
+	int scale = weight_scale(nr);
 
 	/*
 	 * Don't short-cut something we are not going to return!
@@ -180,25 +209,30 @@ static inline int approx_halfway(struct commit_list *p, int nr)
 		return 0;
 	if (DEBUG_BISECT)
 		return 0;
-	/*
-	 * For small number of commits 2 and 3 are halfway of 5, and
-	 * 3 is halfway of 6 but 2 and 4 are not.
-	 */
-	diff = 2 * weight(p) - nr;
-	switch (diff) {
-	case -1: case 0: case 1:
-		return 1;
-	default:
+
+	if (!result_confidence) {
+		scale = nr;
 		/*
-		 * For large number of commits we are not so strict, it's
-		 * good enough if it's within ~0.1% of the halfway point,
-		 * e.g. 5000 is exactly halfway of 10000, but we consider
-		 * the values [4996, 5004] as halfway as well.
+		 * For small number of commits 2 and 3 are halfway of 5, and 3
+		 * is halfway of 6 but 2 and 4 are not.  For large number of
+		 * commits we are not so strict, it's good enough if it's
+		 * within ~0.1% of the halfway point, e.g. 5000 is exactly
+		 * halfway of 10000, but we consider the values [4996, 5004] as
+		 * halfway as well.
 		 */
-		if (abs(diff) < nr / 1024)
-			return 1;
-		return 0;
+		diff = frac_to_fp((nr + 1023) / 1024, nr);
+	} else {
+		/*
+		 * For stochastic we accept any node whose weight is within
+		 * 1/16 from the middle. In the worst case this may result in
+		 * ~20% more tests which is not too bad.
+		 */
+		diff = frac_to_fp(1, 16);
 	}
+	if (weight(p) > (FP_HALF - diff) * scale &&
+	    weight(p) < (FP_HALF + diff) * scale)
+		return 1;
+	return 0;
 }
 
 static void show_list(const char *debug, int counted, int nr,
@@ -227,7 +261,7 @@ static void show_list(const char *debug, int counted, int nr,
 			(commit_flags & UNINTERESTING) ? 'U' : ' ',
 			(commit_flags & COUNTED) ? 'C' : ' ');
 		if (has_weight(p))
-			fprintf(stderr, "%3d", weight(p));
+			fprintf(stderr, "%lf", fp_to_double(weight(p)));
 		else
 			fprintf(stderr, "---");
 		fprintf(stderr, " %.*s", 8, oid_to_hex(&commit->object.oid));
@@ -245,19 +279,20 @@ static void show_list(const char *debug, int counted, int nr,
 static struct commit_list *best_bisection(struct commit_list *list, int nr)
 {
 	struct commit_list *p, *best;
-	int best_distance = -1;
+	fpnum_t best_distance = -1;
+	int scale = weight_scale(nr);
 
 	best = list;
 	for (p = list; p; p = p->next) {
-		int distance;
+		fpnum_t distance;
 		unsigned commit_flags = p->item->object.flags;
 
 		if (commit_flags & TREESAME)
 			continue;
 		distance = weight(p);
-		if (nr - distance < distance)
-			distance = nr - distance;
-		if (distance > best_distance) {
+		if (distance > FP_HALF * scale)
+			distance = FP_ONE * scale - distance;
+		if (best_distance == -1 || distance > best_distance) {
 			best = p;
 			best_distance = distance;
 		}
@@ -268,7 +303,7 @@ static struct commit_list *best_bisection(struct commit_list *list, int nr)
 
 struct commit_dist {
 	struct commit *commit;
-	int distance;
+	fpnum_t distance;
 };
 
 static int compare_commit_dist(const void *a_, const void *b_)
@@ -277,27 +312,32 @@ static int compare_commit_dist(const void *a_, const void *b_)
 
 	a = (struct commit_dist *)a_;
 	b = (struct commit_dist *)b_;
-	if (a->distance != b->distance)
-		return b->distance - a->distance; /* desc sort */
+	if (a->distance != b->distance) {
+		if (a->distance > b->distance)
+			return -1;
+		return 1;
+	}
 	return oidcmp(&a->commit->object.oid, &b->commit->object.oid);
 }
 
-static struct commit_list *best_bisection_sorted(struct commit_list *list, int nr)
+static struct commit_list *best_bisection_sorted(struct commit_list *list,
+						 int nr)
 {
 	struct commit_list *p;
 	struct commit_dist *array = xcalloc(nr, sizeof(*array));
 	struct strbuf buf = STRBUF_INIT;
 	int cnt, i;
+	int scale = weight_scale(nr);
 
 	for (p = list, cnt = 0; p; p = p->next) {
-		int distance;
+		fpnum_t distance;
 		unsigned commit_flags = p->item->object.flags;
 
 		if (commit_flags & TREESAME)
 			continue;
 		distance = weight(p);
-		if (nr - distance < distance)
-			distance = nr - distance;
+		if (distance > FP_HALF * scale)
+			distance = FP_ONE * scale - distance;
 		array[cnt].commit = p->item;
 		array[cnt].distance = distance;
 		cnt++;
@@ -307,7 +347,13 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 		struct object *obj = &(array[i].commit->object);
 
 		strbuf_reset(&buf);
-		strbuf_addf(&buf, "dist=%d", array[i].distance);
+		if (!result_confidence) {
+			strbuf_addf(&buf, "dist=%u",
+				    fp_to_int(array[i].distance));
+		} else {
+			strbuf_addf(&buf, "dist=%lf",
+				    fp_to_double(array[i].distance));
+		}
 		add_name_decoration(DECORATION_NONE, buf.buf, obj);
 
 		p->item = array[i].commit;
@@ -323,7 +369,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 	return list;
 }
 
-#define WEIGHT_UNSET -1
+#define WEIGHT_UNSET ((fpnum_t)-1)
 
 /*
  * Zero or positive weight is the number of interesting commits it can
@@ -381,7 +427,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		for (p = list; p; p = p->next) {
 			struct commit_list *q;
 			unsigned commit_flags = p->item->object.flags;
-			int parent_weight = 0;
+			fpnum_t parent_weight = 0;
 
 			if (weight(p) != WEIGHT_UNSET)
 				continue;
@@ -408,7 +454,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 			 * otherwise inherit it from q directly.
 			 */
 			if (!(commit_flags & TREESAME)) {
-				weight_set(p, parent_weight + 1);
+				weight_set(p, parent_weight + node_weight(p));
 				counted++;
 				show_list("bisection 2 count one",
 					  counted, nr, list);
@@ -433,7 +479,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 
 static void *setup_commit_weight_array(struct commit_list *list, int nodes)
 {
-	int entry_size = sizeof(int);
+	int entry_size = sizeof(fpnum_t);
 	void *array;
 	struct commit_list *p;
 	int n;
@@ -725,7 +771,7 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 			best = list;
 			best->next = NULL;
 		}
-		*reaches = weight(best);
+		*reaches = fp_to_int(weight(best) * (nr / weight_scale(nr)));
 	}
 	free(weights);
 	*commit_list = best;
diff --git a/fixedpoint.h b/fixedpoint.h
index 159bb6ef4358..6d03a5e010ee 100644
--- a/fixedpoint.h
+++ b/fixedpoint.h
@@ -14,6 +14,16 @@ static inline const fpnum_t frac_to_fp(unsigned int n, unsigned int d)
 	return (((fpnum_t)n) << FIXEDPOINT_SHIFT) / d;
 }
 
+static inline const fpnum_t int_to_fp(unsigned int n)
+{
+	return ((fpnum_t)n) << FIXEDPOINT_SHIFT;
+}
+
+static inline unsigned int fp_to_int(fpnum_t n)
+{
+	return (n + (1ULL << (FIXEDPOINT_SHIFT - 1))) >> FIXEDPOINT_SHIFT;
+}
+
 static inline const fpnum_t double_to_fp(double n)
 {
 	return (n * (1ULL << FIXEDPOINT_SHIFT));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 24/27] bisect: Stop bisection when we are confident about bad commit
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (22 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 23/27] bisect: Find bisection point for stochastic weights Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 25/27] bisect: Report commit with the highest probability Jan Kara
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

When we found a commit that has high enough probability of being bad,
stop bisection and report it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/bisect.c b/bisect.c
index 12b027b86e75..7da74778d780 100644
--- a/bisect.c
+++ b/bisect.c
@@ -624,7 +624,7 @@ static int sw_rev_bmp_cmp(const void *data, const struct hashmap_entry *ap,
  * Compute for each commit a probability it is the bad one given tests
  * performed so far.
  */
-static void compute_commit_weights(struct commit_list *list)
+static struct commit_list *compute_commit_weights(struct commit_list *list)
 {
 	struct commit_list *p;
 	struct hashmap reach_map;
@@ -714,9 +714,14 @@ static void compute_commit_weights(struct commit_list *list)
 		found_entry = hashmap_get_entry(&reach_map, &entry, entry, NULL);
 		pweight->node_weight =
 			found_entry->cluster_p_bad / found_entry->count;
+		/* Found node we are confident enough is bad? */
+		if (pweight->node_weight >= result_confidence)
+			break;
 	}
 
 	hashmap_clear_and_free(&reach_map, struct sw_rev_bmp_hash_entry, entry);
+
+	return p;
 }
 
 void find_bisection(struct commit_list **commit_list, int *reaches,
@@ -758,14 +763,21 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 	if (result_confidence)
 		compute_tested_descendants(list);
 	list = reverse_list(list);
-	if (result_confidence)
-		compute_commit_weights(list);
+	if (result_confidence) {
+		best = compute_commit_weights(list);
+		/* Found commit we are confident is bad? Stop bisection... */
+		if (best) {
+			oidcpy(current_bad_oid, &best->item->object.oid);
+			goto found_best;
+		}
+	}
 	show_list("bisection 2 sorted", 0, nr, list);
 
 	/* Do the real work of finding bisection commit. */
 	best = do_find_bisection(list, nr, bisect_flags);
 	if (best) {
 		if (!(bisect_flags & FIND_BISECTION_ALL)) {
+found_best:
 			list->item = best->item;
 			free_commit_list(list->next);
 			best = list;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 25/27] bisect: Report commit with the highest probability
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (23 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 24/27] bisect: Stop bisection when we are confident about bad commit Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 26/27] bisect: Debug stochastic bisection Jan Kara
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

For stochastic bisection it does not make sense to report number of
commits to investigate (since we always consider all). What we can do
though is to report the highest probability some commit has of being
bad. That also indicates how far we are from the end of bisection as
once the probability reaches desired result confidence bisection stops.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c           | 56 +++++++++++++++++++++++++++++++++-------------
 bisect.h           |  4 +++-
 builtin/rev-list.c |  6 +++--
 3 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/bisect.c b/bisect.c
index 7da74778d780..6ed740106795 100644
--- a/bisect.c
+++ b/bisect.c
@@ -724,7 +724,25 @@ static struct commit_list *compute_commit_weights(struct commit_list *list)
 	return p;
 }
 
-void find_bisection(struct commit_list **commit_list, int *reaches,
+static fpnum_t heaviest_commit(struct commit_list *list)
+{
+	fpnum_t best_weight = 0;
+	struct commit_list *p;
+
+	for (p = list; p; p = p->next) {
+		struct st_weight *pweight;
+
+		if (p->item->object.flags & TREESAME)
+			continue;
+
+		pweight = *commit_weight_at(&commit_weight, p->item);
+		if (pweight->node_weight > best_weight)
+			best_weight = pweight->node_weight;
+	}
+	return best_weight;
+}
+
+void find_bisection(struct commit_list **commit_list, fpnum_t *reaches,
 		    int *all, unsigned bisect_flags)
 {
 	int nr, on_list;
@@ -770,6 +788,7 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 			oidcpy(current_bad_oid, &best->item->object.oid);
 			goto found_best;
 		}
+		*reaches = heaviest_commit(list);
 	}
 	show_list("bisection 2 sorted", 0, nr, list);
 
@@ -783,7 +802,8 @@ void find_bisection(struct commit_list **commit_list, int *reaches,
 			best = list;
 			best->next = NULL;
 		}
-		*reaches = fp_to_int(weight(best) * (nr / weight_scale(nr)));
+		if (!result_confidence)
+			*reaches = weight(best) / nr;
 	}
 	free(weights);
 	*commit_list = best;
@@ -1457,7 +1477,8 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 {
 	struct rev_info revs;
 	struct commit_list *tried;
-	int reaches = 0, all = 0, nr, steps;
+	int all = 0, nr, steps;
+	fpnum_t reaches = 0;
 	enum bisect_error res = BISECT_OK;
 	struct object_id *bisect_rev;
 	char *steps_msg;
@@ -1536,19 +1557,24 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		return BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND;
 	}
 
-	nr = all - reaches - 1;
-	steps = estimate_bisect_steps(all);
+	if (!result_confidence) {
+		nr = all - fp_to_int(reaches * all) - 1;
+		steps = estimate_bisect_steps(all);
 
-	steps_msg = xstrfmt(Q_("(roughly %d step)", "(roughly %d steps)",
-		  steps), steps);
-	/*
-	 * TRANSLATORS: the last %s will be replaced with "(roughly %d
-	 * steps)" translation.
-	 */
-	printf(Q_("Bisecting: %d revision left to test after this %s\n",
-		  "Bisecting: %d revisions left to test after this %s\n",
-		  nr), nr, steps_msg);
-	free(steps_msg);
+		steps_msg = xstrfmt(Q_("(roughly %d step)", "(roughly %d steps)",
+			  steps), steps);
+		/*
+		 * TRANSLATORS: the last %s will be replaced with "(roughly %d
+		 * steps)" translation.
+		 */
+		printf(Q_("Bisecting: %d revision left to test after this %s\n",
+			  "Bisecting: %d revisions left to test after this %s\n",
+			  nr), nr, steps_msg);
+		free(steps_msg);
+	} else {
+		printf(_("Bisecting: most probable commit has probability %lf\n"),
+			fp_to_double(reaches));
+	}
 	/* Clean up objects used, as they will be reused. */
 	repo_clear_commit_marks(r, ALL_REV_FLAGS);
 
diff --git a/bisect.h b/bisect.h
index ec24ac2d7ee9..d0a805b05041 100644
--- a/bisect.h
+++ b/bisect.h
@@ -1,6 +1,8 @@
 #ifndef BISECT_H
 #define BISECT_H
 
+#include "fixedpoint.h"
+
 struct commit_list;
 struct repository;
 
@@ -11,7 +13,7 @@ struct repository;
  * Otherwise, it will be either all non-SAMETREE commits or the single
  * best commit, as chosen by `find_all`.
  */
-void find_bisection(struct commit_list **list, int *reaches, int *all,
+void find_bisection(struct commit_list **list, fpnum_t *reaches, int *all,
 		    unsigned bisect_flags);
 
 struct commit_list *filter_skipped(struct commit_list *list,
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 36cb909ebaa5..c98ac9e91688 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -702,7 +702,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		mark_edges_uninteresting(&revs, show_edge, 0);
 
 	if (bisect_list) {
-		int reaches, all;
+		int all;
+		fpnum_t reaches;
 		unsigned bisect_flags = 0;
 
 		if (bisect_find_all)
@@ -714,7 +715,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		find_bisection(&revs.commits, &reaches, &all, bisect_flags);
 
 		if (bisect_show_vars)
-			return show_bisect_vars(&info, reaches, all);
+			return show_bisect_vars(&info, fp_to_int(reaches * all),
+						all);
 	}
 
 	if (filter_provided_objects) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 26/27] bisect: Debug stochastic bisection
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (24 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 25/27] bisect: Report commit with the highest probability Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 16:49 ` [PATCH 27/27] bisect: Allow bisection debugging of approx_halfway() Jan Kara
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Add debug prints for debugging stochastic bisection.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/bisect.c b/bisect.c
index 6ed740106795..4d2ba5dbd77e 100644
--- a/bisect.c
+++ b/bisect.c
@@ -620,6 +620,21 @@ static int sw_rev_bmp_cmp(const void *data, const struct hashmap_entry *ap,
 	return 0;
 }
 
+static void show_cluster(struct sw_rev_bmp_hash_entry *entry)
+{
+	int i;
+
+	if (!DEBUG_BISECT)
+		return;
+
+	fprintf(stderr, "cluster prob %"FPNUM_FMT" members %u reaching revs ",
+		entry->cluster_p_bad, entry->count);
+	for (i = 0; i < ptest_revs.nr; i++)
+		if (sw_rev_bmp_test(entry->commit_weight, i))
+			fprintf(stderr, "%.8s ", oid_to_hex(ptest_revs.oid + i));
+	fprintf(stderr, "\n");
+}
+
 /*
  * Compute for each commit a probability it is the bad one given tests
  * performed so far.
@@ -697,6 +712,7 @@ static struct commit_list *compute_commit_weights(struct commit_list *list)
 	hashmap_for_each_entry(&reach_map, &reach_iter, found_entry, entry) {
 		found_entry->cluster_p_bad = fp_div(found_entry->cluster_p_bad,
 						    cluster_prob_sum);
+		show_cluster(found_entry);
 	}
 
 	/* Uniformly distribute the probability among all nodes of a cluster */
@@ -1111,6 +1127,30 @@ static struct commit_list *managed_skipped(struct commit_list *list,
 	return skip_away(list, count);
 }
 
+static void print_object(struct object *obj)
+{
+	unsigned flags = obj->flags;
+
+	fprintf(stderr, "%c%c%c%c ",
+		(flags & TREESAME) ? ' ' : 'T',
+		(flags & UNINTERESTING) ? 'U' : ' ',
+		(flags & SEEN) ? 'S' : ' ',
+		(flags & COUNTED) ? 'C' : ' ');
+	fprintf(stderr, "%.8s\n", oid_to_hex(&obj->oid));
+}
+
+static void show_object_array(const char *str, struct object_array *array)
+{
+	int i;
+
+	if (!DEBUG_BISECT)
+		return;
+
+	fprintf(stderr, "%s\n", str);
+	for (i = 0; i < array->nr; i++)
+		print_object(array->objects[i].item);
+}
+
 static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 			     const char *prefix,
 			     const char *bad_format, const char *good_format,
@@ -1139,12 +1179,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 
 	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
 	/* XXX leak rev_argv, as "revs" may still be pointing to it */
+	show_list("setup_revisions commits", 0, 0, revs->commits);
+	show_object_array("setup_revisions pending", &revs->pending);
 }
 
 static void bisect_common(struct rev_info *revs)
 {
 	if (prepare_revision_walk(revs))
 		die("revision walk setup failed");
+	show_list("bisect_common commits", 0, 0, revs->commits);
+	show_object_array("bisect_common pending", &revs->pending);
 	if (revs->tree_objects)
 		mark_edges_uninteresting(revs, NULL, 0);
 }
@@ -1462,6 +1506,12 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
 	fclose(fp);
 }
 
+static int debug_print_oid(const struct object_id *oid, void *data)
+{
+	fprintf(stderr, "%s\n", oid_to_hex(oid));
+	return 0;
+}
+
 /*
  * We use the convention that return BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND (-10) means
  * the bisection process finished successfully.
@@ -1493,6 +1543,12 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	if (read_bisect_refs())
 		die(_("reading bisect refs failed"));
 	read_bisect_confidences();
+	if (DEBUG_BISECT) {
+		debug_bisect("good revs:\n");
+		oid_array_for_each(&good_revs, debug_print_oid, NULL);
+		debug_bisect("p-test revs:\n");
+		oid_array_for_each(&ptest_revs, debug_print_oid, NULL);
+	}
 
 	if (file_exists(git_path_bisect_first_parent()))
 		bisect_flags |= FIND_BISECTION_FIRST_PARENT_ONLY;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 27/27] bisect: Allow bisection debugging of approx_halfway()
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (25 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 26/27] bisect: Debug stochastic bisection Jan Kara
@ 2021-11-18 16:49 ` Jan Kara
  2021-11-18 22:49 ` Stochastic bisection support Taylor Blau
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-18 16:49 UTC (permalink / raw)
  To: git; +Cc: Jan Kara

Currently approx_halfway() is shortcircuited when bisection debugging is
turned on. Allow possibility of printing all debug messages while
keeping approx_halfway() working by setting DEBUG_BISECT to 1. Disable
approx_halfway() only when BISECT_DEBUG is larger than 1. Also add a
debug message to approx_halfway() to dump info about commit it is
selecting.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 bisect.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 4d2ba5dbd77e..60e8c32056ac 100644
--- a/bisect.c
+++ b/bisect.c
@@ -207,7 +207,7 @@ static inline int approx_halfway(struct commit_list *p, int nr)
 	 */
 	if (p->item->object.flags & TREESAME)
 		return 0;
-	if (DEBUG_BISECT)
+	if (DEBUG_BISECT > 1)
 		return 0;
 
 	if (!result_confidence) {
@@ -230,8 +230,11 @@ static inline int approx_halfway(struct commit_list *p, int nr)
 		diff = frac_to_fp(1, 16);
 	}
 	if (weight(p) > (FP_HALF - diff) * scale &&
-	    weight(p) < (FP_HALF + diff) * scale)
+	    weight(p) < (FP_HALF + diff) * scale) {
+		debug_bisect("found approx half: %lf diff %lf\n",
+			     fp_to_double(weight(p)), fp_to_double(diff));
 		return 1;
+	}
 	return 0;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/27] bisect: Fixup test rev-list-bisect/02
  2021-11-18 16:49 ` [PATCH 01/27] bisect: Fixup test rev-list-bisect/02 Jan Kara
@ 2021-11-18 20:08   ` Chris Torek
  2021-11-19 16:31     ` Johannes Schindelin
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Torek @ 2021-11-18 20:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: Git List

On Thu, Nov 18, 2021 at 10:38 AM Jan Kara <jack@suse.cz> wrote:
> diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
> index b95a0212adff..48db52447fd3 100755
> --- a/t/t6002-rev-list-bisect.sh
> +++ b/t/t6002-rev-list-bisect.sh
> @@ -247,8 +247,9 @@ test_expect_success 'set up fake --bisect refs' '
>  test_expect_success 'rev-list --bisect can default to good/bad refs' '
>         # the only thing between c3 and c1 is c2
>         git rev-parse c2 >expect &&
> -       git rev-list --bisect >actual &&
> -       test_cmp expect actual
> +       git rev-parse b2 >>expect &&
> +       actual=$(git rev-list --bisect) &&
> +       grep &>/dev/null $actual expect

`&>` is a bashism; you need `>/dev/null 2>&1` here for general portability.

Chris

ref: https://unix.stackexchange.com/questions/581507/redirecting-stdout-and-stderr-together-vs-redirecting-stdout-and-then-stderr-to/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/27] bisect: Fixup test bisect-porcelain/20
  2021-11-18 16:49 ` [PATCH 03/27] bisect: Fixup test bisect-porcelain/20 Jan Kara
@ 2021-11-18 20:13   ` Chris Torek
  2021-11-18 22:10     ` Taylor Blau
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Torek @ 2021-11-18 20:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Git List

On Thu, Nov 18, 2021 at 10:39 AM Jan Kara <jack@suse.cz> wrote:
> diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> index f8cfdd3c36d2..13f7deea4d81 100755
> --- a/t/t6030-bisect-porcelain.sh
> +++ b/t/t6030-bisect-porcelain.sh
> @@ -240,8 +240,13 @@ test_expect_success 'bisect skip: cannot tell between 3 commits' '
>  test_expect_success 'bisect skip: cannot tell between 2 commits' '
>         test_when_finished git bisect reset &&
>         git bisect start $HASH4 $HASH1 &&
> -       git bisect skip &&
> -       test_expect_code 2 git bisect good >my_bisect_log.txt &&
> +       if [ $(git rev-parse HEAD) == $HASH2 ]; then
> +               results=('good' 'skip')
> +       else
> +               results=('skip' 'good')
> +       fi &&
> +       git bisect ${results[0]} &&
> +       test_expect_code 2 git bisect ${results[1]} >my_bisect_log.txt &&

These are also not available in old POSIX shell - consider using two
separate variables to hold the two strings.

Chris

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/27] bisect: Fixup bisect-porcelain/17
  2021-11-18 16:49 ` [PATCH 02/27] bisect: Fixup bisect-porcelain/17 Jan Kara
@ 2021-11-18 22:05   ` Taylor Blau
  2021-11-22 12:27     ` Jan Kara
  0 siblings, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2021-11-18 22:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: git

On Thu, Nov 18, 2021 at 05:49:15PM +0100, Jan Kara wrote:

> Test 17 from t6030-bisect-porcelain.sh assumes that bisection algorithm
> suggests first HASH3 where HASH2 and HASH3 are equivalent choices. Make
> sure test correctly handles both choices, add test variant to properly
> test commit skipping in the second case.

OK, makes sense-ish: at least in the context of preparing for the
bisection algorithm to change. The subject line leaves a bit to be
desired, though. Perhaps:

  t6030: handle equivalent bisection points gracefully

> diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> index 1be85d064e76..f8cfdd3c36d2 100755
> --- a/t/t6030-bisect-porcelain.sh
> +++ b/t/t6030-bisect-porcelain.sh
> @@ -197,11 +197,27 @@ test_expect_success 'bisect skip: successful result' '
>  	test_when_finished git bisect reset &&
>  	git bisect reset &&
>  	git bisect start $HASH4 $HASH1 &&
> -	git bisect skip &&
> +	if [ $(git rev-parse HEAD) == $HASH3 ]; then

This is somewhat uncommon style for Git's test suite. It might be more
appropriate to write instead:

    if test "$HASH3" = "$(git rev-parse HEAD)"
    then
      git bisect skip
    fi &&
    # ...

> +# $HASH1 is good, $HASH4 is bad, we skip $HASH2
> +# but $HASH3 is good,

It looks like this comment should have gone above the start of the test
in the previous hunk.

But it looks like you accidentally duplicated this test in its entirety
(with the addition of the misplaced comment) below instead.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/27] bisect: Fixup test bisect-porcelain/20
  2021-11-18 20:13   ` Chris Torek
@ 2021-11-18 22:10     ` Taylor Blau
  2021-11-22 12:49       ` Jan Kara
  0 siblings, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2021-11-18 22:10 UTC (permalink / raw)
  To: Chris Torek; +Cc: Jan Kara, Git List

On Thu, Nov 18, 2021 at 12:13:21PM -0800, Chris Torek wrote:
> On Thu, Nov 18, 2021 at 10:39 AM Jan Kara <jack@suse.cz> wrote:
> > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> > index f8cfdd3c36d2..13f7deea4d81 100755
> > --- a/t/t6030-bisect-porcelain.sh
> > +++ b/t/t6030-bisect-porcelain.sh
> > @@ -240,8 +240,13 @@ test_expect_success 'bisect skip: cannot tell between 3 commits' '
> >  test_expect_success 'bisect skip: cannot tell between 2 commits' '
> >         test_when_finished git bisect reset &&
> >         git bisect start $HASH4 $HASH1 &&
> > -       git bisect skip &&
> > -       test_expect_code 2 git bisect good >my_bisect_log.txt &&
> > +       if [ $(git rev-parse HEAD) == $HASH2 ]; then
> > +               results=('good' 'skip')
> > +       else
> > +               results=('skip' 'good')
> > +       fi &&
> > +       git bisect ${results[0]} &&
> > +       test_expect_code 2 git bisect ${results[1]} >my_bisect_log.txt &&
>
> These are also not available in old POSIX shell - consider using two
> separate variables to hold the two strings.

Or just inlining the commands that you actually want to run inside of
the if statement above:

    if test "$HASH2" = "$(git rev-parse HEAD)
    then
      git bisect good &&
      test_expect_code 2 git bisect skip >my_bisect_log.txt
    else
      git bisect skip &&
      test_expect_code 2 git bisect good >my_bisect_log.txt
    fi && #...

Here (and in the previous patch) it might be helpful to add a short note
in these conditionals, maybe along the lines of:

    # HASH2 and HASH3 are equivalent choices, but we only want to mark
    # HASH2 as "good". Handle either ordering:

Same note on the brevity of the subject line applies here, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (26 preceding siblings ...)
  2021-11-18 16:49 ` [PATCH 27/27] bisect: Allow bisection debugging of approx_halfway() Jan Kara
@ 2021-11-18 22:49 ` Taylor Blau
  2021-11-22 12:13   ` Jan Kara
  2021-11-19 16:39 ` Johannes Schindelin
  2021-11-22 12:55 ` Christian Couder
  29 siblings, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2021-11-18 22:49 UTC (permalink / raw)
  To: Jan Kara; +Cc: git

On Thu, Nov 18, 2021 at 05:49:13PM +0100, Jan Kara wrote:

> The first part of the series improves some tests so that they accept
> other valid decisions for bisection points. This is needed because to
> make it easier to share some logic between normal and stochastic
> bisection, I needed to slightly change some bits for normal bisection
> and then since commit weights will be computed in a somewhat different
> order, also chosen bisection points are sometimes different.

I have only looked through a couple of the first half of your patches,
but I'm not sure I understand why non-stochastic bisection needs to
change at all in order to support stochastic bisection.

In other words, if we're tweaking all of these tests to allow picking
equivalent bisection points, why can't we simply leave them alone? It
would be nice if normal bisection didn't change as a result of adding a
new feature on top.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/27] bisect: Fixup test rev-list-bisect/02
  2021-11-18 20:08   ` Chris Torek
@ 2021-11-19 16:31     ` Johannes Schindelin
  2021-11-22 12:48       ` Jan Kara
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Schindelin @ 2021-11-19 16:31 UTC (permalink / raw)
  To: Chris Torek; +Cc: Jan Kara, Git List

Hi,

On Thu, 18 Nov 2021, Chris Torek wrote:

> On Thu, Nov 18, 2021 at 10:38 AM Jan Kara <jack@suse.cz> wrote:
> > diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
> > index b95a0212adff..48db52447fd3 100755
> > --- a/t/t6002-rev-list-bisect.sh
> > +++ b/t/t6002-rev-list-bisect.sh
> > @@ -247,8 +247,9 @@ test_expect_success 'set up fake --bisect refs' '
> >  test_expect_success 'rev-list --bisect can default to good/bad refs' '
> >         # the only thing between c3 and c1 is c2
> >         git rev-parse c2 >expect &&
> > -       git rev-list --bisect >actual &&
> > -       test_cmp expect actual
> > +       git rev-parse b2 >>expect &&
> > +       actual=$(git rev-list --bisect) &&
> > +       grep &>/dev/null $actual expect
>
> `&>` is a bashism; you need `>/dev/null 2>&1` here for general portability.

More importantly, why do you suppress the output in the first place? This
will make debugging breakages harder.

Let's just not redirect the output?

I do see a more structural problem here, though. Throughout the test
suite, it is our custom to generate files called `expect` with what we
consider the expected output, and then generate `actual` with the actual
output. We then compare the results and complain if they are not
identical.

With this patch, we break that paradigm. All of a sudden, `expect` is not
at all the expected output anymore, but a haystack in which we want to
find one thing.

And even after reading the commit message twice, I am unconvinced that b2
(whatever that is) might be an equally good choice. I become even more
doubtful about that statement when I look at the code comment at the
beginning of the test case:

	# the only thing between c3 and c1 is c2

So either this code comment is wrong, or the patch. And if the code
comment is wrong, I would like to know when it became wrong, and how, and
why it slipped through our review.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (27 preceding siblings ...)
  2021-11-18 22:49 ` Stochastic bisection support Taylor Blau
@ 2021-11-19 16:39 ` Johannes Schindelin
  2021-11-20  7:54   ` Chris Torek
  2021-11-22 12:55 ` Christian Couder
  29 siblings, 1 reply; 43+ messages in thread
From: Johannes Schindelin @ 2021-11-19 16:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: git

Hi Jan,

On Thu, 18 Nov 2021, Jan Kara wrote:

> In some cases regressions (or generally changes) we are trying to bisect have
> probabilistic nature. This can for example happen for hard to trigger race
> condition where it is difficult to distinguish working state from just not
> hitting the race or it can happen for performance regressions where it is
> sometimes difficult to distinguish random workload fluctuations from the
> regression we are looking for. With standard bisection the only option we have
> is to repeatedly test suggested bisection point until we are sure enough which
> way to go. This leads to rather long bisection times and still a single wrong
> decision whether a commit is good to bad renders the whole bisection useless.
>
> Stochastic bisection tries to address these problems. When deciding whether a
> commit is good or bad, you can also specify your confidence in the decision.
> For performance tests you can usually directly infer this confidence from the
> distance of your current result from good/bad values, for hard to reproduce
> races you are usually 100% confident for bad commits, for good commits you need
> to somehow estimate your confidence based on past experience with reproducing
> the issue. The stochastic bisection algorithm then uses these test results
> and confidences to suggest next commit to try, tracking for each commit the
> probability the commit is the bad one given current test results. Once some
> commit reaches high enough probability (set when starting bisection) of being
> the bad one, we stop bisecting and annouce this commit.

An interesting problem, for sure!

It is slightly related to a scenario that has been described to me
recently: in a gigantic project whose full test suite is too large to run
with every Pull Request, where tests are more likely to become flaky
rather than simply break, a stochastic CI regime was introduced where a
semi-random subset of the test suite is run with every CI build. That team
also came up with the concept of attaching confidences as you describe.

I only had time to look at the first patch closely so far. I hope to find
more time next week to review further.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-19 16:39 ` Johannes Schindelin
@ 2021-11-20  7:54   ` Chris Torek
  2021-11-22 11:57     ` Jan Kara
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Torek @ 2021-11-20  7:54 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jan Kara, Git List

On Fri, Nov 19, 2021 at 8:51 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> An interesting problem, for sure!
>
> It is slightly related to a scenario that has been described to me
> recently: in a gigantic project whose full test suite is too large to run
> with every Pull Request, where tests are more likely to become flaky
> rather than simply break, a stochastic CI regime was introduced where a
> semi-random subset of the test suite is run with every CI build. That team
> also came up with the concept of attaching confidences as you describe.
>
> I only had time to look at the first patch closely so far. I hope to find
> more time next week to review further.

I only scanned for obvious items myself, but the idea of a
probabilistic test is indeed interesting.

I do wonder why you (Jan Kara, not Dscho :-) ) used fixed-point
arithmetic here though.

Chris

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-20  7:54   ` Chris Torek
@ 2021-11-22 11:57     ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 11:57 UTC (permalink / raw)
  To: Chris Torek; +Cc: Johannes Schindelin, Jan Kara, Git List

On Fri 19-11-21 23:54:12, Chris Torek wrote:
> On Fri, Nov 19, 2021 at 8:51 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > An interesting problem, for sure!
> >
> > It is slightly related to a scenario that has been described to me
> > recently: in a gigantic project whose full test suite is too large to run
> > with every Pull Request, where tests are more likely to become flaky
> > rather than simply break, a stochastic CI regime was introduced where a
> > semi-random subset of the test suite is run with every CI build. That team
> > also came up with the concept of attaching confidences as you describe.
> >
> > I only had time to look at the first patch closely so far. I hope to find
> > more time next week to review further.
> 
> I only scanned for obvious items myself, but the idea of a
> probabilistic test is indeed interesting.
> 
> I do wonder why you (Jan Kara, not Dscho :-) ) used fixed-point
> arithmetic here though.

Ah, very good question. I guess because I'm primarily Linux kernel
programmer and we don't have floating point arithmetics in the kernel so
I'm used to fixedpoint :).  Secondarily because fixedpoint arithmetics
provides me more control over where I'm loosing precision but looking at
this now I agree the ugliness in the code is not probably worth the
imagined win.  I'll rewrite stuff using floating point. Thanks for the
suggestion.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-18 22:49 ` Stochastic bisection support Taylor Blau
@ 2021-11-22 12:13   ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 12:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jan Kara, git

On Thu 18-11-21 17:49:48, Taylor Blau wrote:
> On Thu, Nov 18, 2021 at 05:49:13PM +0100, Jan Kara wrote:
> 
> > The first part of the series improves some tests so that they accept
> > other valid decisions for bisection points. This is needed because to
> > make it easier to share some logic between normal and stochastic
> > bisection, I needed to slightly change some bits for normal bisection
> > and then since commit weights will be computed in a somewhat different
> > order, also chosen bisection points are sometimes different.
> 
> I have only looked through a couple of the first half of your patches,
> but I'm not sure I understand why non-stochastic bisection needs to
> change at all in order to support stochastic bisection.
> 
> In other words, if we're tweaking all of these tests to allow picking
> equivalent bisection points, why can't we simply leave them alone? It
> would be nice if normal bisection didn't change as a result of adding a
> new feature on top.

The big part of why results for normal bisection change are the changes in
"bisect: Reorganize commit weight computation" to function
do_find_bisection() where previously we didn't call approx_halfway() on the
commits at the end of chain (looks like unintended omission) while after my
changes we call approx_halfway() for all commits. And I have reorganized
do_find_bisection() because I reuse it for stochastic bisection as well and
the code is IMO easier to understand after reorganization so it is still
comprehensible after I add there complexity of stochastic bisection.

I understand the churn in the tests is unwelcome but long-term it seems
like a low enough cost to pay for more maintainable code. But if git
maintainers think otherwise I can try keeping classic bisection code
decisions without changes. Just let me know what you prefer.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/27] bisect: Fixup bisect-porcelain/17
  2021-11-18 22:05   ` Taylor Blau
@ 2021-11-22 12:27     ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 12:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jan Kara, git

On Thu 18-11-21 17:05:41, Taylor Blau wrote:
> On Thu, Nov 18, 2021 at 05:49:15PM +0100, Jan Kara wrote:
> 
> > Test 17 from t6030-bisect-porcelain.sh assumes that bisection algorithm
> > suggests first HASH3 where HASH2 and HASH3 are equivalent choices. Make
> > sure test correctly handles both choices, add test variant to properly
> > test commit skipping in the second case.
> 
> OK, makes sense-ish: at least in the context of preparing for the
> bisection algorithm to change. The subject line leaves a bit to be
> desired, though. Perhaps:
> 
>   t6030: handle equivalent bisection points gracefully

Sure, I can improve all subjects to test updates like this.

> > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> > index 1be85d064e76..f8cfdd3c36d2 100755
> > --- a/t/t6030-bisect-porcelain.sh
> > +++ b/t/t6030-bisect-porcelain.sh
> > @@ -197,11 +197,27 @@ test_expect_success 'bisect skip: successful result' '
> >  	test_when_finished git bisect reset &&
> >  	git bisect reset &&
> >  	git bisect start $HASH4 $HASH1 &&
> > -	git bisect skip &&
> > +	if [ $(git rev-parse HEAD) == $HASH3 ]; then
> 
> This is somewhat uncommon style for Git's test suite. It might be more
> appropriate to write instead:
> 
>     if test "$HASH3" = "$(git rev-parse HEAD)"
>     then
>       git bisect skip
>     fi &&
>     # ...

Sure. Will do.

> > +# $HASH1 is good, $HASH4 is bad, we skip $HASH2
> > +# but $HASH3 is good,
> 
> It looks like this comment should have gone above the start of the test
> in the previous hunk.
> 
> But it looks like you accidentally duplicated this test in its entirety
> (with the addition of the misplaced comment) below instead.

No, I think the comment and the test are correct. The first test tests
situation

H1--H2--H3--H4
^   ^   ^   ^
|   bad |   bad
good    skipped

the second test tests situation

H1--H2--H3--H4
^   ^   ^   ^
|   skipped   bad
good    good

So in both cases we can decide about the bad commit besides the skipped
commit. And if the bisection algorithm picks H2 out of H2/H3 (which are
equivalent) then second test tests this situation fully, if the bisection
algorithm picks H3, then the first test tests this situation fully.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/27] bisect: Fixup test rev-list-bisect/02
  2021-11-19 16:31     ` Johannes Schindelin
@ 2021-11-22 12:48       ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 12:48 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Chris Torek, Jan Kara, Git List

On Fri 19-11-21 17:31:22, Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 18 Nov 2021, Chris Torek wrote:
> 
> > On Thu, Nov 18, 2021 at 10:38 AM Jan Kara <jack@suse.cz> wrote:
> > > diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
> > > index b95a0212adff..48db52447fd3 100755
> > > --- a/t/t6002-rev-list-bisect.sh
> > > +++ b/t/t6002-rev-list-bisect.sh
> > > @@ -247,8 +247,9 @@ test_expect_success 'set up fake --bisect refs' '
> > >  test_expect_success 'rev-list --bisect can default to good/bad refs' '
> > >         # the only thing between c3 and c1 is c2
> > >         git rev-parse c2 >expect &&
> > > -       git rev-list --bisect >actual &&
> > > -       test_cmp expect actual
> > > +       git rev-parse b2 >>expect &&
> > > +       actual=$(git rev-list --bisect) &&
> > > +       grep &>/dev/null $actual expect
> >
> > `&>` is a bashism; you need `>/dev/null 2>&1` here for general portability.
> 
> More importantly, why do you suppress the output in the first place? This
> will make debugging breakages harder.
> 
> Let's just not redirect the output?

Sure, I can leave error output alone. I'll do that.

> I do see a more structural problem here, though. Throughout the test
> suite, it is our custom to generate files called `expect` with what we
> consider the expected output, and then generate `actual` with the actual
> output. We then compare the results and complain if they are not
> identical.

A lot of bisection tests do not work like that. Just look through
t/t6030-bisect-porcelain.sh for example. I agree that the usage of 'expect'
and 'actual' may be misleading after my changes though so I will rename
them.

> With this patch, we break that paradigm. All of a sudden, `expect` is not
> at all the expected output anymore, but a haystack in which we want to
> find one thing.
> 
> And even after reading the commit message twice, I am unconvinced that b2
> (whatever that is) might be an equally good choice. I become even more
> doubtful about that statement when I look at the code comment at the
> beginning of the test case:
> 
> 	# the only thing between c3 and c1 is c2
> 
> So either this code comment is wrong, or the patch. And if the code
> comment is wrong, I would like to know when it became wrong, and how, and
> why it slipped through our review.

I agree the comment is confusing but it is more incomplete than outright
wrong. I can fix that. The graph operated by this test looks like:

b1--b2
 \    \
  \c1--c2--c3

b1 and c1 are marked as good, c3 is marked as bad. Now b2 & c2 are indeed
equivalent bisection choices because after picking any of them we may need
one more bisection step to identify the bad commit.

The test including the comment was introduced by 03df567fbf6a
("for_each_bisect_ref(): don't trim refnames") but I cannot really comment
on why the comment passed review :). IMO because it seems obvious enough...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/27] bisect: Fixup test bisect-porcelain/20
  2021-11-18 22:10     ` Taylor Blau
@ 2021-11-22 12:49       ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 12:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Chris Torek, Jan Kara, Git List

On Thu 18-11-21 17:10:43, Taylor Blau wrote:
> On Thu, Nov 18, 2021 at 12:13:21PM -0800, Chris Torek wrote:
> > On Thu, Nov 18, 2021 at 10:39 AM Jan Kara <jack@suse.cz> wrote:
> > > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
> > > index f8cfdd3c36d2..13f7deea4d81 100755
> > > --- a/t/t6030-bisect-porcelain.sh
> > > +++ b/t/t6030-bisect-porcelain.sh
> > > @@ -240,8 +240,13 @@ test_expect_success 'bisect skip: cannot tell between 3 commits' '
> > >  test_expect_success 'bisect skip: cannot tell between 2 commits' '
> > >         test_when_finished git bisect reset &&
> > >         git bisect start $HASH4 $HASH1 &&
> > > -       git bisect skip &&
> > > -       test_expect_code 2 git bisect good >my_bisect_log.txt &&
> > > +       if [ $(git rev-parse HEAD) == $HASH2 ]; then
> > > +               results=('good' 'skip')
> > > +       else
> > > +               results=('skip' 'good')
> > > +       fi &&
> > > +       git bisect ${results[0]} &&
> > > +       test_expect_code 2 git bisect ${results[1]} >my_bisect_log.txt &&
> >
> > These are also not available in old POSIX shell - consider using two
> > separate variables to hold the two strings.
> 
> Or just inlining the commands that you actually want to run inside of
> the if statement above:
> 
>     if test "$HASH2" = "$(git rev-parse HEAD)
>     then
>       git bisect good &&
>       test_expect_code 2 git bisect skip >my_bisect_log.txt
>     else
>       git bisect skip &&
>       test_expect_code 2 git bisect good >my_bisect_log.txt
>     fi && #...
> 
> Here (and in the previous patch) it might be helpful to add a short note
> in these conditionals, maybe along the lines of:
> 
>     # HASH2 and HASH3 are equivalent choices, but we only want to mark
>     # HASH2 as "good". Handle either ordering:
> 
> Same note on the brevity of the subject line applies here, too.

Sure, I'll fix these. Thanks for review.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-18 16:49 Stochastic bisection support Jan Kara
                   ` (28 preceding siblings ...)
  2021-11-19 16:39 ` Johannes Schindelin
@ 2021-11-22 12:55 ` Christian Couder
  2021-11-22 13:31   ` Jan Kara
  29 siblings, 1 reply; 43+ messages in thread
From: Christian Couder @ 2021-11-22 12:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: git

Hi,

On Thu, Nov 18, 2021 at 9:33 PM Jan Kara <jack@suse.cz> wrote:
>
> Hello,
>
> In some cases regressions (or generally changes) we are trying to bisect have
> probabilistic nature. This can for example happen for hard to trigger race
> condition where it is difficult to distinguish working state from just not
> hitting the race or it can happen for performance regressions where it is
> sometimes difficult to distinguish random workload fluctuations from the
> regression we are looking for. With standard bisection the only option we have
> is to repeatedly test suggested bisection point until we are sure enough which
> way to go. This leads to rather long bisection times and still a single wrong
> decision whether a commit is good to bad renders the whole bisection useless.
>
> Stochastic bisection tries to address these problems. When deciding whether a
> commit is good or bad, you can also specify your confidence in the decision.
> For performance tests you can usually directly infer this confidence from the
> distance of your current result from good/bad values, for hard to reproduce
> races you are usually 100% confident for bad commits, for good commits you need
> to somehow estimate your confidence based on past experience with reproducing
> the issue. The stochastic bisection algorithm then uses these test results
> and confidences to suggest next commit to try, tracking for each commit the
> probability the commit is the bad one given current test results. Once some
> commit reaches high enough probability (set when starting bisection) of being
> the bad one, we stop bisecting and annouce this commit.

The following project is based on Bayesian Search Theory and might be
interesting if you haven't looked at it:

https://github.com/Ealdwulf/BBChop

Best,
Christian.



















> Example:
>
> Consider an example of a stochastic bisection of the following commits:
>
> A--B--C--D-----F-----H--------K
>  \     \  \-E-/     /        /
>   \     \--------G-/        /
>    \------------------I--J-/
>
> And suppose commit I is the bad one. Let's start bisection with:
>
> # We ask bisection for 90% confidence in the identified commit being bad
> git bisect start --confidence 0.9 %K %A
>
> # Bisection tells us to test %F. Let's assume test went well and we trust
> # our test results on 70%. So:
> git bisect good --confidence 0.7
>
> # Bisection tells us to test %H. Again same result:
> git bisect good --confidence 0.7
>
> # Bisection tells us to test %J. The test should fail for %J (remember %I is
> # the bad commit) but let's assume the test is falsely positive. So:
> git bisect good --confidence 0.7
>
> # We are asked to test %H second time. Assume correct result so:
> git bisect good --confidence 0.7
>
> # We are asked to test %J second time. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # We are asked to test %J again. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # And %J once more. Assume false positive so:
> git bisect good --confidence 0.7
>
> # And %J once more. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # And %J again. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # And now we are asked to test %I. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # We are asked to test %I second time. Assume false positive so:
> git bisect good --confidence 0.7
>
> # And %I once again. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # And %I once again. Assume correct result so:
> git bisect bad --confidence 0.7
>
> # And %I once again. Assume correct result so:
> git bisect bad --confidence 0.7
>
> And now git tells us %I is the bad commit with desired confidence. We can see
> the bisection was able to identify the bad commit although there were three
> false positive tests (out of total 14 tests).
>
> ------
>
> This patch set implements stochastic bisection for git. The first part of the
> series improves some tests so that they accept other valid decisions for
> bisection points. This is needed because to make it easier to share some logic
> between normal and stochastic bisection, I needed to slightly change some bits
> for normal bisection and then since commit weights will be computed in a
> somewhat different order, also chosen bisection points are sometimes different.
>
> The second part of the series then implements stochastic bisection itself.
> Note that I didn't integrate any tests for stochastic bisection into 'make
> test' run yet (so far I did only manual tests) and I still need to update
> manpages etc. I plan to do that but I've decided to post the series now to get
> some early feedback.
>
>                                                                 Honza
>
> PS: Please leave me in CC for replies. I'm not subscribed to the git mailing
> list.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Stochastic bisection support
  2021-11-22 12:55 ` Christian Couder
@ 2021-11-22 13:31   ` Jan Kara
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kara @ 2021-11-22 13:31 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jan Kara, git

Hi!

On Mon 22-11-21 13:55:33, Christian Couder wrote:
> On Thu, Nov 18, 2021 at 9:33 PM Jan Kara <jack@suse.cz> wrote:
> >
> > Hello,
> >
> > In some cases regressions (or generally changes) we are trying to bisect have
> > probabilistic nature. This can for example happen for hard to trigger race
> > condition where it is difficult to distinguish working state from just not
> > hitting the race or it can happen for performance regressions where it is
> > sometimes difficult to distinguish random workload fluctuations from the
> > regression we are looking for. With standard bisection the only option we have
> > is to repeatedly test suggested bisection point until we are sure enough which
> > way to go. This leads to rather long bisection times and still a single wrong
> > decision whether a commit is good to bad renders the whole bisection useless.
> >
> > Stochastic bisection tries to address these problems. When deciding whether a
> > commit is good or bad, you can also specify your confidence in the decision.
> > For performance tests you can usually directly infer this confidence from the
> > distance of your current result from good/bad values, for hard to reproduce
> > races you are usually 100% confident for bad commits, for good commits you need
> > to somehow estimate your confidence based on past experience with reproducing
> > the issue. The stochastic bisection algorithm then uses these test results
> > and confidences to suggest next commit to try, tracking for each commit the
> > probability the commit is the bad one given current test results. Once some
> > commit reaches high enough probability (set when starting bisection) of being
> > the bad one, we stop bisecting and annouce this commit.
> 
> The following project is based on Bayesian Search Theory and might be
> interesting if you haven't looked at it:
> 
> https://github.com/Ealdwulf/BBChop

Thanks for the link. I already know about that project and I had a look
into it when doing some initial research. But the biggest limitation of
that project is that it works only for linear history. I need to generally
bisect Linux kernel repository which has enough merges that the limitation
of linear history makes the use of the above tool impractical.

Furthermore direct integration of stochastic bisection into git makes this
easier to integrate into our performance testing framework.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-11-22 13:31 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 16:49 Stochastic bisection support Jan Kara
2021-11-18 16:49 ` [PATCH 01/27] bisect: Fixup test rev-list-bisect/02 Jan Kara
2021-11-18 20:08   ` Chris Torek
2021-11-19 16:31     ` Johannes Schindelin
2021-11-22 12:48       ` Jan Kara
2021-11-18 16:49 ` [PATCH 02/27] bisect: Fixup bisect-porcelain/17 Jan Kara
2021-11-18 22:05   ` Taylor Blau
2021-11-22 12:27     ` Jan Kara
2021-11-18 16:49 ` [PATCH 03/27] bisect: Fixup test bisect-porcelain/20 Jan Kara
2021-11-18 20:13   ` Chris Torek
2021-11-18 22:10     ` Taylor Blau
2021-11-22 12:49       ` Jan Kara
2021-11-18 16:49 ` [PATCH 04/27] bisect: Fixup bisect-porcelain/32 Jan Kara
2021-11-18 16:49 ` [PATCH 05/27] bisect: Fixup bisect-porcelain/34 Jan Kara
2021-11-18 16:49 ` [PATCH 06/27] bisect: Fixup bisect-porcelain/40 Jan Kara
2021-11-18 16:49 ` [PATCH 07/27] bisect: Remove duplicated bisect-porcelain/48 Jan Kara
2021-11-18 16:49 ` [PATCH 08/27] bisect: Fixup bisect-porcelain/50 Jan Kara
2021-11-18 16:49 ` [PATCH 09/27] bisect: Fixup bisect-porcelain/54 Jan Kara
2021-11-18 16:49 ` [PATCH 10/27] bisect: Fixup bisect-porcelain/58 Jan Kara
2021-11-18 16:49 ` [PATCH 11/27] bisect: Fix bisection debugging Jan Kara
2021-11-18 16:49 ` [PATCH 12/27] bisect: Accept and store confidence with each decision Jan Kara
2021-11-18 16:49 ` [PATCH 13/27] bisect: Allow specifying desired result confidence Jan Kara
2021-11-18 16:49 ` [PATCH 14/27] bisect: Use void * for commit_weight Jan Kara
2021-11-18 16:49 ` [PATCH 15/27] bisect: Rename clear_distance() to clear_counted_flag() Jan Kara
2021-11-18 16:49 ` [PATCH 16/27] bisect: Separate commit list reversal Jan Kara
2021-11-18 16:49 ` [PATCH 17/27] bisect: Allow more complex commit weights Jan Kara
2021-11-18 16:49 ` [PATCH 18/27] bisect: Terminate early if there are no eligible commits Jan Kara
2021-11-18 16:49 ` [PATCH 19/27] bisect: Compute reachability of tested revs Jan Kara
2021-11-18 16:49 ` [PATCH 20/27] bisect: Compute probability a particular commit is bad Jan Kara
2021-11-18 16:49 ` [PATCH 21/27] bisect: Reorganize commit weight computation Jan Kara
2021-11-18 16:49 ` [PATCH 22/27] bisect: Move count_distance() Jan Kara
2021-11-18 16:49 ` [PATCH 23/27] bisect: Find bisection point for stochastic weights Jan Kara
2021-11-18 16:49 ` [PATCH 24/27] bisect: Stop bisection when we are confident about bad commit Jan Kara
2021-11-18 16:49 ` [PATCH 25/27] bisect: Report commit with the highest probability Jan Kara
2021-11-18 16:49 ` [PATCH 26/27] bisect: Debug stochastic bisection Jan Kara
2021-11-18 16:49 ` [PATCH 27/27] bisect: Allow bisection debugging of approx_halfway() Jan Kara
2021-11-18 22:49 ` Stochastic bisection support Taylor Blau
2021-11-22 12:13   ` Jan Kara
2021-11-19 16:39 ` Johannes Schindelin
2021-11-20  7:54   ` Chris Torek
2021-11-22 11:57     ` Jan Kara
2021-11-22 12:55 ` Christian Couder
2021-11-22 13:31   ` Jan Kara

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).