git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] Add bad-branch-first option for git-bisect
@ 2011-01-24  2:03 Shuang He
  2011-01-24  2:05 ` [PATCH] add config option core.bisectbadbranchfirst Shuang He
  2011-01-24  9:53 ` [RFC] Add bad-branch-first option for git-bisect Christian Couder
  0 siblings, 2 replies; 14+ messages in thread
From: Shuang He @ 2011-01-24  2:03 UTC (permalink / raw)
  To: git; +Cc: shuang.he

Hi
      The default git-bisect algorithm will jump around the commit tree,
on the purpose of taking least steps to find the first culprit commit.
We may find it sometime would locate a old culprit commit that we're not
concerned about anymore.
      In most software development, there's one or two main branch which
is maintained for release, and a bunch of feature branches are created
for new feature development or bug fix.  For the reason that sometime
git-bisect will locate a old culprit commit would be:
          1. Quality of those branches may not match the main branch,
some functionality are broken at first and fixed later on the feature
branch. If git-bisect jump to there by chance, git-bisect will only find that old
culprit commit which only exists on that feature branch
          2. Some of those branches may not synchronized with main
branch in time.  Say feature1 is broken when feature2 branch is created, and
feature1 is fixed just a moment later after feature2 branch is created,
and when feature2's development is done, and developer want to merge
feature2 branch back to master branch, feature2 will be firstly
synchronized to master branch tip, then merge into master.  For the same
reason addressed in issue 1, this will also lead git-bisect into wrong
direction.

      In all, we think we do not care about branches that we're not
currently working, unless we're sure the regression is caused by that
branch.

      To address those issue, we propose to add a new config option:
          core.bisectbadbranchfirst::
              With this algorithm, git-bisect will always try to select
commits
              that on the same branch current bad commit sits. And will
fall back
              to default git-bisect algorithm when bad-branch-first
algorithm does
              not apply
          +
          This setting defaults to "false".

      The draft patch will be sent out in a later email, so it could be
reviewed inline.
      Any question or suggestion is welcome  :-)

Thanks
      --Shuang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] add config option core.bisectbadbranchfirst
  2011-01-24  2:03 [RFC] Add bad-branch-first option for git-bisect Shuang He
@ 2011-01-24  2:05 ` Shuang He
  2011-01-24  9:53 ` [RFC] Add bad-branch-first option for git-bisect Christian Couder
  1 sibling, 0 replies; 14+ messages in thread
From: Shuang He @ 2011-01-24  2:05 UTC (permalink / raw)
  To: git; +Cc: Shuang He

which enable recursive bad-branch-first algorithm for git-bisect.
With this algorithm, git-bisect will always try to select commits
that on the same branch current bad commit sits. And will fall back
to default git-bisect algorithm when bad-branch-first algorithm does
not apply

Signed-off-by: Shuang He <shuang.he@intel.com>
---
 Documentation/config.txt |    8 ++
 bisect.c                 |  244 +++++++++++++++++++++++++++++++++++++++++-----
 bisect.h                 |    2 +-
 builtin/rev-list.c       |    2 +-
 cache.h                  |    1 +
 config.c                 |    6 +
 environment.c            |    1 +
 7 files changed, 238 insertions(+), 26 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ff7c225..8502859 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -558,6 +558,14 @@ core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
 
+core.bisectbadbranchfirst::
+	With this algorithm, git-bisect will always try to select commits
+	that on the same branch current bad commit sits. And will fall back
+	to default git-bisect algorithm when bad-branch-first algorithm does
+	not apply
++
+This setting defaults to "false".
+
 add.ignore-errors::
 add.ignoreErrors::
 	Tells 'git add' to continue adding files when some files cannot be
diff --git a/bisect.c b/bisect.c
index 060c042..57c410d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -189,6 +189,46 @@ static struct commit_list *best_bisection(struct commit_list *list, int nr)
 	return best;
 }
 
+static struct commit_list *best_bisection_first_parent(struct commit_list *list, struct commit_list *list2, int nr)
+{
+	struct commit_list *p, *best;
+	struct commit_list *p2;
+	int best_distance = -1;
+
+	best = NULL;
+	for (p2 = list2; p2; p2 = p2->next) {
+		int distance;
+		int on_branch;
+
+		on_branch = 0;
+		for (p = list; p; p = p->next) {
+			unsigned flags = p->item->object.flags;
+			if (flags & TREESAME)
+				continue;
+			if (!hashcmp(p->item->object.sha1, p2->item->object.sha1)) {
+				on_branch = 1;
+				break;
+			}
+		}
+
+		if (!on_branch)
+			continue;
+
+		weight_set(p2, weight(p));
+		distance = weight(p);
+		if (nr - distance < distance)
+			distance = nr - distance;
+		if (distance > best_distance) {
+			best = p;
+			best_distance = distance;
+		}
+
+	}
+
+	return list2;
+}
+
+
 struct commit_dist {
 	struct commit *commit;
 	int distance;
@@ -253,7 +293,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
  * unknown.  After running count_distance() first, they will get zero
  * or positive distance.
  */
-static struct commit_list *do_find_bisection(struct commit_list *list,
+static struct commit_list *do_find_bisection(struct commit_list *list, struct commit_list *list2,
 					     int nr, int *weights,
 					     int find_all)
 {
@@ -314,7 +354,7 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 		clear_distance(list);
 
 		/* Does it happen to be at exactly half-way? */
-		if (!find_all && halfway(p, nr))
+		if (!list2 && !find_all && halfway(p, nr))
 			return p;
 		counted++;
 	}
@@ -352,20 +392,27 @@ static struct commit_list *do_find_bisection(struct commit_list *list,
 				weight_set(p, weight(q));
 
 			/* Does it happen to be at exactly half-way? */
-			if (!find_all && halfway(p, nr))
+			if (!list2 && !find_all && halfway(p, nr))
 				return p;
 		}
 	}
 
 	show_list("bisection 2 counted all", counted, nr, list);
 
+	if (list2) {
+		struct commit_list *t;
+		t = best_bisection_first_parent(list, list2, nr);
+		t = best_bisection_sorted(t, nr);
+		return t;
+	}
+
 	if (!find_all)
 		return best_bisection(list, nr);
 	else
 		return best_bisection_sorted(list, nr);
 }
 
-struct commit_list *find_bisection(struct commit_list *list,
+struct commit_list *find_bisection(struct commit_list *list, struct commit_list *list2,
 					  int *reaches, int *all,
 					  int find_all)
 {
@@ -400,7 +447,7 @@ struct commit_list *find_bisection(struct commit_list *list,
 	weights = xcalloc(on_list, sizeof(*weights));
 
 	/* Do the real work of finding bisection commit. */
-	best = do_find_bisection(list, nr, weights, find_all);
+	best = do_find_bisection(list, list2, nr, weights, find_all);
 	if (best) {
 		if (!find_all)
 			best->next = NULL;
@@ -683,6 +730,33 @@ static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 	setup_revisions(rev_argv.argv_nr, rev_argv.argv, revs, NULL);
 }
 
+static void bisect_rev_setup_first_parent(struct rev_info *revs, const char *prefix,
+			     const char *bad_format, const char *good_format,
+			     int read_paths)
+{
+	struct argv_array rev_argv = { NULL, 0, 0 };
+	int i;
+
+	init_revisions(revs, prefix);
+	revs->abbrev = 0;
+	revs->commit_format = CMIT_FMT_UNSPECIFIED;
+
+	/* rev_argv.argv[0] will be ignored by setup_revisions */
+	argv_array_push(&rev_argv, xstrdup("bisect_rev_setup_first_parent"));
+	argv_array_push_sha1(&rev_argv, current_bad_sha1, bad_format);
+	for (i = 0; i < good_revs.sha1_nr; i++)
+		argv_array_push_sha1(&rev_argv, good_revs.sha1[i],
+				     good_format);
+	argv_array_push(&rev_argv, xstrdup("--first-parent"));
+	argv_array_push(&rev_argv, xstrdup("--"));
+	if (read_paths)
+		read_bisect_paths(&rev_argv);
+	argv_array_push(&rev_argv, NULL);
+
+	setup_revisions(rev_argv.argv_nr, rev_argv.argv, revs, NULL);
+}
+
+
 static void bisect_common(struct rev_info *revs)
 {
 	if (prepare_revision_walk(revs))
@@ -944,6 +1018,24 @@ static void show_diff_tree(const char *prefix, struct commit *commit)
 	log_tree_commit(&opt, commit);
 }
 
+int is_merge_commit(struct commit_list *entry) {
+	int nr = 0;
+	struct commit *commit;
+	struct commit_list *p;
+
+	if (entry) {
+		p = entry->item->parents;
+
+		while (p) {
+			nr += 1;
+			p = p->next;
+		}
+	}
+
+	return nr - 1;
+}
+
+
 /*
  * We use the convention that exiting with an exit code 10 means that
  * the bisection process finished successfully.
@@ -952,41 +1044,145 @@ static void show_diff_tree(const char *prefix, struct commit *commit)
 int bisect_next_all(const char *prefix)
 {
 	struct rev_info revs;
+	struct rev_info first_parent_revs;
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
+	struct object_array pending_copy;
+	struct object_array pending_copy2;
 	const unsigned char *bisect_rev;
+	int i;
 	char bisect_rev_hex[41];
+	int bad_branch_first_working = 1;
 
 	if (read_bisect_refs())
 		die("reading bisect refs failed");
 
 	check_good_are_ancestors_of_bad(prefix);
+	struct argv_array rev_argv = { NULL, 0, 0 };
+	
+	read_bisect_paths(&rev_argv);
+	if (rev_argv.argv_nr > 0)
+		core_bisect_bad_branch_first = 0;
+
+	if (core_bisect_bad_branch_first) {
+		bisect_rev_setup_first_parent(&first_parent_revs, prefix, "%s", "^%s", 1);
+		memset(&pending_copy, 0, sizeof(pending_copy));
+		for (i = 0; i < first_parent_revs.pending.nr; i++)
+			add_object_array(first_parent_revs.pending.objects[i].item,
+					 first_parent_revs.pending.objects[i].name,
+					 &pending_copy);
+
+		bisect_common(&first_parent_revs);
+
+		/* Clean up objects used, as they will be reused. */
+		for (i = 0; i < pending_copy.nr; i++) {
+			struct object *o = pending_copy.objects[i].item;
+			clear_commit_marks((struct commit *)o, ALL_REV_FLAGS);
+		}
+	}
 
-	bisect_rev_setup(&revs, prefix, "%s", "^%s", 1);
-	revs.limited = 1;
 
-	bisect_common(&revs);
 
-	revs.commits = find_bisection(revs.commits, &reaches, &all,
-				       !!skipped_revs.sha1_nr);
-	revs.commits = managed_skipped(revs.commits, &tried);
+	if (core_bisect_bad_branch_first) {
+		bisect_rev_setup(&revs, prefix, "%s", "^%s", 1);
+		memset(&pending_copy2, 0, sizeof(pending_copy2));
+		for (i = 0; i < revs.pending.nr; i++)
+			add_object_array(revs.pending.objects[i].item,
+					 revs.pending.objects[i].name,
+					 &pending_copy2);
+
+		revs.limited = 1;
+
+		bisect_common(&revs);
+
+		revs.commits = find_bisection(revs.commits, first_parent_revs.commits, &reaches, &all,
+					       !!skipped_revs.sha1_nr);
+		/* TODO: Check if this is a first bad commit on bad branch, and if it's a merge commit */
+		/* If this is a merge commit, we need to try all of its parents, until we found the merged bad branch */
+		/* If all parents are good, we just announce it as first bad commit */
+		if (!hashcmp(revs.commits->item->object.sha1, current_bad_sha1)) {
+			printf("%s is the first bad commit\n", sha1_to_hex(current_bad_sha1));
+
+			if (is_merge_commit(revs.commits)) {
+				int all_parents_good = 1;
+				struct commit_list *p;
+
+				printf("%s is a merge commit\n", sha1_to_hex(current_bad_sha1));
+				printf("need to try each merged branch\n", sha1_to_hex(current_bad_sha1));
+				p = revs.commits->item->parents;
+
+				while (p) {
+					if (!hashcmp(p->item->object.sha1, current_bad_sha1)) {
+						fprintf(stderr, "should not get here\n");
+						exit(1);
+					} else if (0 <= lookup_sha1_array(&good_revs, p->item->object.sha1)) {
+						fprintf(stderr, "good parent %s\n", sha1_to_hex(p->item->object.sha1));
+					} else if (0 <= lookup_sha1_array(&skipped_revs, p->item->object.sha1)) {
+						fprintf(stderr, "skipped parent %s\n", sha1_to_hex(p->item->object.sha1));
+						all_parents_good = 0;
+					} else {
+						printf("try merged branch %s\n", sha1_to_hex(p->item->object.sha1));
+						exit(bisect_checkout(sha1_to_hex(p->item->object.sha1)));
+					}
+					p = p->next;
+				}
+
+				if (all_parents_good) {
+					show_diff_tree(prefix, revs.commits->item);
+					exit(10);
+				}
+			}
+			else {
+				show_diff_tree(prefix, revs.commits->item);
+				exit(10);
+			}
+		}
+
+		revs.commits = managed_skipped(revs.commits, &tried);
 
-	if (!revs.commits) {
-		/*
-		 * We should exit here only if the "bad"
-		 * commit is also a "skip" commit.
-		 */
-		exit_if_skipped_commits(tried, NULL);
+		if (!revs.commits || !all) {
+			bad_branch_first_working = 0;
+			fprintf(stderr, "fall back to default git-bisect algorithm\n");
+		}
+		else
+			fprintf(stderr, "proceed with core_bisect_bad_branch_first algorithm\n");
 
-		printf("%s was both good and bad\n",
-		       sha1_to_hex(current_bad_sha1));
-		exit(1);
+		/* Clean up objects used, as they will be reused. */
+		for (i = 0; i < pending_copy2.nr; i++) {
+			struct object *o = pending_copy2.objects[i].item;
+			clear_commit_marks((struct commit *)o, ALL_REV_FLAGS);
+		}
 	}
 
-	if (!all) {
-		fprintf(stderr, "No testable commit found.\n"
-			"Maybe you started with bad path parameters?\n");
-		exit(4);
+
+	if (!core_bisect_bad_branch_first || !bad_branch_first_working) {
+		bisect_rev_setup(&revs, prefix, "%s", "^%s", 1);
+		revs.limited = 1;
+
+		bisect_common(&revs);
+		show_list("fall back", 0, 0, revs.commits);
+
+		revs.commits = find_bisection(revs.commits, NULL, &reaches, &all,
+					       !!skipped_revs.sha1_nr);
+		revs.commits = managed_skipped(revs.commits, &tried);
+
+		if (!revs.commits) {
+			/*
+			 * We should exit here only if the "bad"
+			 * commit is also a "skip" commit.
+			 */
+			exit_if_skipped_commits(tried, NULL);
+	
+			printf("%s was both good and bad\n",
+			       sha1_to_hex(current_bad_sha1));
+			exit(1);
+		}
+
+		if (!all) {
+			fprintf(stderr, "No testable commit found.\n"
+				"Maybe you started with bad path parameters?\n");
+			exit(4);
+		}
 	}
 
 	bisect_rev = revs.commits->item->object.sha1;
diff --git a/bisect.h b/bisect.h
index 0862ce5..fad4fbe 100644
--- a/bisect.h
+++ b/bisect.h
@@ -1,7 +1,7 @@
 #ifndef BISECT_H
 #define BISECT_H
 
-extern struct commit_list *find_bisection(struct commit_list *list,
+extern struct commit_list *find_bisection(struct commit_list *list, struct commit_list *list2,
 					  int *reaches, int *all,
 					  int find_all);
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ba27d39..ab56f6b 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -399,7 +399,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (bisect_list) {
 		int reaches = reaches, all = all;
 
-		revs.commits = find_bisection(revs.commits, &reaches, &all,
+		revs.commits = find_bisection(revs.commits, NULL, &reaches, &all,
 					      bisect_find_all);
 
 		if (bisect_show_vars)
diff --git a/cache.h b/cache.h
index d83d68c..bfd4448 100644
--- a/cache.h
+++ b/cache.h
@@ -559,6 +559,7 @@ extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern int core_bisect_bad_branch_first;
 
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index 625e051..b37a70c 100644
--- a/config.c
+++ b/config.c
@@ -660,6 +660,12 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.bisectbadbranchfirst")) {
+		core_bisect_bad_branch_first = git_config_bool(var, value);
+		return 0;
+	}
+
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 9564475..cf813af 100644
--- a/environment.c
+++ b/environment.c
@@ -55,6 +55,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
+int core_bisect_bad_branch_first = 0;
 struct startup_info *startup_info;
 
 /* Parallel index stat data preload? */
-- 
1.7.4.rc2.21.g7f7a8

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24  2:03 [RFC] Add bad-branch-first option for git-bisect Shuang He
  2011-01-24  2:05 ` [PATCH] add config option core.bisectbadbranchfirst Shuang He
@ 2011-01-24  9:53 ` Christian Couder
  2011-01-24 10:30   ` Shuang He
  2011-01-24 20:28   ` Avery Pennarun
  1 sibling, 2 replies; 14+ messages in thread
From: Christian Couder @ 2011-01-24  9:53 UTC (permalink / raw)
  To: Shuang He; +Cc: git, apenwarr

Hi,

On Mon, Jan 24, 2011 at 3:03 AM, Shuang He <shuang.he@intel.com> wrote:
> Hi
>     The default git-bisect algorithm will jump around the commit tree,
> on the purpose of taking least steps to find the first culprit commit.
> We may find it sometime would locate a old culprit commit that we're not
> concerned about anymore.

Yes, it can be a problem.

>     In most software development, there's one or two main branch which
> is maintained for release, and a bunch of feature branches are created
> for new feature development or bug fix.  For the reason that sometime
> git-bisect will locate a old culprit commit would be:
>         1. Quality of those branches may not match the main branch,
> some functionality are broken at first and fixed later on the feature
> branch. If git-bisect jump to there by chance, git-bisect will only find
> that old
> culprit commit which only exists on that feature branch

If the quality of these branches is too bad, I think they should not
have been merged in the first place.
If they are not merged (and not marked as good), then git bisect will
not look at them, since it will look only at commits that are
ancestors of the bad commit it is given.

Or if one is merged but it causes too many problems, then perhaps a
replacement commit could be used to unmerge the branch.

Another possibility is to have in a file a list of commits that are
the last commits on these branches before the merge commits, and do a:

git bisect good $(cat good_commits_file.txt)

at the beginning of each bisection.

So I think the long term solution in this case is not what your are suggesting.

>         2. Some of those branches may not synchronized with main
> branch in time.  Say feature1 is broken when feature2 branch is created, and
> feature1 is fixed just a moment later after feature2 branch is created,
> and when feature2's development is done, and developer want to merge
> feature2 branch back to master branch, feature2 will be firstly
> synchronized to master branch tip, then merge into master.  For the same
> reason addressed in issue 1, this will also lead git-bisect into wrong
> direction.

I am not sure what you mean by " feature2 will be firstly synchronized
to master branch tip", and I think this should mean a rebase that
would fix the bug if feature1 has already been merged into the master
branch.

But anyway in this case, I think that git bisect will find that the
first bad commit is the last commit in the branch, just before it was
merged. And by looking at the branch graph it should be quite easy to
understand what happened.

And then the obvious thing to do is to decide to just start a new
bisection like it was started the first time but with an added "git
bisect good <merge_commit>", where <merge_commit> is the commit that
merges the branch.

>     In all, we think we do not care about branches that we're not
> currently working, unless we're sure the regression is caused by that
> branch.
>
>     To address those issue, we propose to add a new config option:
>         core.bisectbadbranchfirst::
>             With this algorithm, git-bisect will always try to select
> commits
>             that on the same branch current bad commit sits. And will
> fall back
>             to default git-bisect algorithm when bad-branch-first
> algorithm does
>             not apply
>         +
>         This setting defaults to "false".

I am not opposed to an option to bisect on the first parents of the
bad commit only. And after a very fast look at your patch it seems to
be what it does. By the way Avery Pennarun's gitbuilder
(https://github.com/apenwarr/gitbuilder) does the same thing. So I
know some people are interested in such a feature.

But here are some suggestions/comments:

- your explanations about why it could be useful should be improved,
- the name "bisectbadbranchfirst" seems wrong to me, because git
branches are just some special tags; "firstparentsonly" would be a
better name,
- before having a config option for it, why not have it first as an
option to "git bisect start",
- if there is a config option, then there should probably be an option
to "git bisect start" to use the regular algorithm,
- it seems to me that bisecting only on first parents could fail only
if some "good" commits are not ancestors of the bad commit, and if the
bug is fixed on a branch where such a commit is; so I don't see the
point in defaulting to the usual algorithm in this case because in
this case the usual algorithm just stops,
- perhaps the message given when the first bad commit is found should
be changed a little bit when this option is used.

Maybe I am missing something, because as I said I didn't read your
patch carefully, but in this case please try to give a concrete
example.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24  9:53 ` [RFC] Add bad-branch-first option for git-bisect Christian Couder
@ 2011-01-24 10:30   ` Shuang He
  2011-01-24 10:50     ` Johannes Sixt
  2011-01-25  9:20     ` Christian Couder
  2011-01-24 20:28   ` Avery Pennarun
  1 sibling, 2 replies; 14+ messages in thread
From: Shuang He @ 2011-01-24 10:30 UTC (permalink / raw)
  To: Christian Couder; +Cc: git@vger.kernel.org, apenwarr@gmail.com

On 2011/1/24 17:53, Christian Couder wrote:
> Hi,
>
> On Mon, Jan 24, 2011 at 3:03 AM, Shuang He<shuang.he@intel.com>  wrote:
>> Hi
>>      The default git-bisect algorithm will jump around the commit tree,
>> on the purpose of taking least steps to find the first culprit commit.
>> We may find it sometime would locate a old culprit commit that we're not
>> concerned about anymore.
> Yes, it can be a problem.

I'm honored to be given so much comment :)
Thank you

>>      In most software development, there's one or two main branch which
>> is maintained for release, and a bunch of feature branches are created
>> for new feature development or bug fix.  For the reason that sometime
>> git-bisect will locate a old culprit commit would be:
>>          1. Quality of those branches may not match the main branch,
>> some functionality are broken at first and fixed later on the feature
>> branch. If git-bisect jump to there by chance, git-bisect will only find
>> that old
>> culprit commit which only exists on that feature branch
> If the quality of these branches is too bad, I think they should not
> have been merged in the first place.
> If they are not merged (and not marked as good), then git bisect will
> not look at them, since it will look only at commits that are
> ancestors of the bad commit it is given.
>
> Or if one is merged but it causes too many problems, then perhaps a
> replacement commit could be used to unmerge the branch.
>
> Another possibility is to have in a file a list of commits that are
> the last commits on these branches before the merge commits, and do a:
>
> git bisect good $(cat good_commits_file.txt)
>
> at the beginning of each bisection.
>
> So I think the long term solution in this case is not what your are suggesting.

Yeah, I agree that the issue I addressed above will not be a problem if 
all those branches are maintained very well.
Actually we've implemented a automated bisect system for Intel Linux 
Graphics Driver Project, and so we'd like the system
helps us to locate issue in an more automatic way when branches are not 
maintained as good as expected.

>>          2. Some of those branches may not synchronized with main
>> branch in time.  Say feature1 is broken when feature2 branch is created, and
>> feature1 is fixed just a moment later after feature2 branch is created,
>> and when feature2's development is done, and developer want to merge
>> feature2 branch back to master branch, feature2 will be firstly
>> synchronized to master branch tip, then merge into master.  For the same
>> reason addressed in issue 1, this will also lead git-bisect into wrong
>> direction.
> I am not sure what you mean by " feature2 will be firstly synchronized
> to master branch tip", and I think this should mean a rebase that
> would fix the bug if feature1 has already been merged into the master
> branch.
>
> But anyway in this case, I think that git bisect will find that the
> first bad commit is the last commit in the branch, just before it was
> merged. And by looking at the branch graph it should be quite easy to
> understand what happened.
>
> And then the obvious thing to do is to decide to just start a new
> bisection like it was started the first time but with an added "git
> bisect good<merge_commit>", where<merge_commit>  is the commit that
> merges the branch.

For the same reason, that we're implementing automated bisect system , 
so we want git-bisect to
be able to help with this condition also.

>>      In all, we think we do not care about branches that we're not
>> currently working, unless we're sure the regression is caused by that
>> branch.
>>
>>      To address those issue, we propose to add a new config option:
>>          core.bisectbadbranchfirst::
>>              With this algorithm, git-bisect will always try to select
>> commits
>>              that on the same branch current bad commit sits. And will
>> fall back
>>              to default git-bisect algorithm when bad-branch-first
>> algorithm does
>>              not apply
>>          +
>>          This setting defaults to "false".
> I am not opposed to an option to bisect on the first parents of the
> bad commit only. And after a very fast look at your patch it seems to
> be what it does. By the way Avery Pennarun's gitbuilder
> (https://github.com/apenwarr/gitbuilder) does the same thing. So I
> know some people are interested in such a feature.
>
> But here are some suggestions/comments:
>
> - your explanations about why it could be useful should be improved,

I don't have much data to prove it. I could just say it could help if 
branches are not maintained very well.

> - the name "bisectbadbranchfirst" seems wrong to me, because git
> branches are just some special tags; "firstparentsonly" would be a
> better name,

It's recursively applying bad branch first algorithm, not just 
constantly stick to first parent.
Given this condition:
     A -> B -> C -> D -> E -> F -> G -> H   (master)
          \ a  -> b -> c -> d -> e /  (feature 1)
               \ x -> y -> z/      (feature 2)
start with H as bad commit, and A as good commit, if y is the target bad 
commit. bad-branch-first algorithm will do it like this:
     1. In first round stick to master branch, so it will locate G as 
first bad commit
     2. In second round stick to feature1 branch, then it will locate d 
as first bad commit
     3. In third round stick to feature2 branch, then it will finally 
locate y as first bad commit
So you could see, it's always sticking to branch where current bad 
commit sit


> - before having a config option for it, why not have it first as an
> option to "git bisect start",

Agree, I have thought about it. Haven't done it yet

> - if there is a config option, then there should probably be an option
> to "git bisect start" to use the regular algorithm,

Agree

> - it seems to me that bisecting only on first parents could fail only
> if some "good" commits are not ancestors of the bad commit, and if the
> bug is fixed on a branch where such a commit is; so I don't see the
> point in defaulting to the usual algorithm in this case because in
> this case the usual algorithm just stops,

It should stop as default git-bisect do, when
There are a few cases that this algorithm will fall back to default 
algorithm:
     when bisect path is specified. Since --first-parent seems not 
working as expected when file path is specified
     when bad branch first algorithm could not find a commit to bisect

> - perhaps the message given when the first bad commit is found should
> be changed a little bit when this option is used.

Yeah, agree

> Maybe I am missing something, because as I said I didn't read your
> patch carefully, but in this case please try to give a concrete
> example.
>
> Thanks,
> Christian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 10:30   ` Shuang He
@ 2011-01-24 10:50     ` Johannes Sixt
  2011-01-24 11:05       ` Shuang He
  2011-01-25  9:20     ` Christian Couder
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Sixt @ 2011-01-24 10:50 UTC (permalink / raw)
  To: Shuang He; +Cc: Christian Couder, git@vger.kernel.org, apenwarr@gmail.com

Am 1/24/2011 11:30, schrieb Shuang He:
> It's recursively applying bad branch first algorithm, not just constantly
> stick to first parent.
> Given this condition:
>     A -> B -> C -> D -> E -> F -> G -> H   (master)
>          \ a  -> b -> c -> d -> e /  (feature 1)
>               \ x -> y -> z/      (feature 2)
> start with H as bad commit, and A as good commit, if y is the target bad
> commit. bad-branch-first algorithm will do it like this:
>     1. In first round stick to master branch, so it will locate G as first
> bad commit
>     2. In second round stick to feature1 branch, then it will locate d as
> first bad commit
>     3. In third round stick to feature2 branch, then it will finally
> locate y as first bad commit
> So you could see, it's always sticking to branch where current bad commit sit

Ok, so you explain what your algorithm does.

But you did not illustrate your problem. The history above is ordinary,
somewhat branchy, has *ONE* commit that introduces a regression, and *NO*
commit that fixes the regression. But in your rationale you said something
about "feature1 is fixed just a moment later after feature2 branch is
created". How does this fit into the picture, where is the problem, and
how does your algorithm solve it?

-- Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 10:50     ` Johannes Sixt
@ 2011-01-24 11:05       ` Shuang He
  2011-01-24 20:04         ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Shuang He @ 2011-01-24 11:05 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Christian Couder, git@vger.kernel.org, apenwarr@gmail.com

On 2011/1/24 18:50, Johannes Sixt wrote:
> Am 1/24/2011 11:30, schrieb Shuang He:
>> It's recursively applying bad branch first algorithm, not just constantly
>> stick to first parent.
>> Given this condition:
>>      A ->  B ->  C ->  D ->  E ->  F ->  G ->  H   (master)
>>           \ a  ->  b ->  c ->  d ->  e /  (feature 1)
>>                \ x ->  y ->  z/      (feature 2)
>> start with H as bad commit, and A as good commit, if y is the target bad
>> commit. bad-branch-first algorithm will do it like this:
>>      1. In first round stick to master branch, so it will locate G as first
>> bad commit
>>      2. In second round stick to feature1 branch, then it will locate d as
>> first bad commit
>>      3. In third round stick to feature2 branch, then it will finally
>> locate y as first bad commit
>> So you could see, it's always sticking to branch where current bad commit sit
> Ok, so you explain what your algorithm does.
>
> But you did not illustrate your problem. The history above is ordinary,
> somewhat branchy, has *ONE* commit that introduces a regression, and *NO*
> commit that fixes the regression. But in your rationale you said something
> about "feature1 is fixed just a moment later after feature2 branch is
> created". How does this fit into the picture, where is the problem, and
> how does your algorithm solve it?
>
> -- Hannes

If A is bad commit, and C fixed it, and then F is bad again,

A ->  B ->  C ->  D ->  E ->  F ->  G ->  H   (master)
   \                    \      /
     a  ->  b... c ->  d ->  e->f  (feature 1)

Start with H as bad commit, and D as good commit, it's possible git-bisect would jump to c, and it will lead to wrong direction

If bad-branch-first is used, it would be:
1. first round found F
2. end

Thanks
	--Shuang

Thanks
	--Shuang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 11:05       ` Shuang He
@ 2011-01-24 20:04         ` Junio C Hamano
  2011-01-25  3:27           ` Shuang He
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2011-01-24 20:04 UTC (permalink / raw)
  To: Shuang He
  Cc: Johannes Sixt, Christian Couder, git@vger.kernel.org,
	apenwarr@gmail.com

Shuang He <shuang.he@intel.com> writes:

> If A is bad commit, and C fixed it, and then F is bad again,
>
> A ->  B ->  C ->  D ->  E ->  F ->  G ->  H   (master)
>   \                    \      /
>     a  ->  b... c ->  d ->  e->f  (feature 1)
>
> Start with H as bad commit, and D as good commit, it's possible git-bisect would jump to c, and it will lead to wrong direction
>
> If bad-branch-first is used, it would be:
> 1. first round found F
> 2. end

It is unclear from the way you drew the picture if "F" is supposed to be a
merge of "E" and "f", but I'd assume that it is.

So what you are saying in 1. is "skip from H until you hit a first merge
(without testing any intermediate commit), find F and stop to check it,
and find that it is broken".

What makes you decide "2. end"?  The fact that both of its parents "E" and
"f" are Ok?  IOW, it won't be "2. end" if one of the parents of the merge
is broken?

What if there is _no_ merge from a side branch but there were breakages in
A (fixed in C) and then F in your original picture, i.e.

  A---B---C---D---E---F---G---H (broken)
  x       o           x       

and you are hunting for the bug starting from H?  How does your algorithm
help?  I grossed over the linear part by saying "skip from H until you hit
a first merge", but in general, what is your plan to handle linear part of
the history?

A totally unacceptable answer is "It does not help linear case, but it
helps when there are merges".  The a-thru-f side branch in your picture,
or any "culprit side branch that was merged" your algorithm finds in
general, would eventually have a linear segment, and having x-o-x in the
history fundmentally breaks "bisect"---your band-aid will not help.

The whole idea behind using "bisect" to gain efficiency in isolating the
issue depends on "Once you see a Good commit, you do not have to search
beyond its ancestors", as it is to look for a single breakage that
persists to the "Bad" commit you give, and as far as "bisect" is
concerned, the breakage at A in your example is an unrelated breakage that
did not persist through the history to the "Bad" commit H.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24  9:53 ` [RFC] Add bad-branch-first option for git-bisect Christian Couder
  2011-01-24 10:30   ` Shuang He
@ 2011-01-24 20:28   ` Avery Pennarun
  2011-01-26  7:11     ` Shuang He
  1 sibling, 1 reply; 14+ messages in thread
From: Avery Pennarun @ 2011-01-24 20:28 UTC (permalink / raw)
  To: Christian Couder; +Cc: Shuang He, git, Johannes Sixt, Junio C Hamano

On Mon, Jan 24, 2011 at 1:53 AM, Christian Couder
<christian.couder@gmail.com> wrote:
> I am not opposed to an option to bisect on the first parents of the
> bad commit only. And after a very fast look at your patch it seems to
> be what it does. By the way Avery Pennarun's gitbuilder
> (https://github.com/apenwarr/gitbuilder) does the same thing. So I
> know some people are interested in such a feature.

Just some notes on gitbuilder's algorithm, since I haven't spent the
time to fully understand Shuang's proposal.

I do understand at least one of his concerns, that is, that people
like to do a lot of "messy" development on a branch, and when the
branch is done, merge the whole messy branch into the "mainline".  The
messy branch would then have a lot of commits that break a lot of
things before fixing them again later.

In a corporate environment, this method allows people to work all day,
make frequent commits, pull from other branches at will, and never
risk their lives by doing poorly-educated rebases.  It works pretty
well *until* you try to bisect, at which time all these messy commits
start to bite you.

gitbuilder's bisection is a total hack around this situation, although
it happens to work perfectly in the workflow it was designed for, thus
making me feel clever.

Basically, we push/fetch *all* the branches from *everybody* into a
single repo, and build all of them as frequently as we can.  If you
think about it, if you have all the branches that someone might have
pulled/merged from, then you don't have to think of the git history as
a whole complicated DAG; you can just think of it as a whole bunch of
separate chunks of linear history.  Moreover, as long as people are
careful to only pull from a branch when that branch is passing all
tests - which you can easily see by looking at the gitbuilder console
- then playing inside each of these chunks of linear history can help
you figure out where particular bugs were introduced during "messy"
branches.

It also allows you a nice separation of concerns.  The owner of the
mainline branch (the "integration manager" person) only really cares
about which branch they merged that caused a problem, because that
person doesn't want to fix bugs, he/she simply wants to know who owns
the failing branch, so that person can fix *their* bug and their
branch will merge without breaking things.

So this is why gitbuilder uses "git rev-list --first-parent" during
its "fake bisection" operation: because a different person is
responsible for each "linear chunk" of history.

Note that you have to use --no-ff when merging if you want this to
work reliably.  But the build manager person can just remember to do
that.  Combining --no-ff and --ff-only (which sound mutually exclusive
but aren't) is a way to be extra specially sure.

Now, if you aren't using gitbuilder, what we want from "bisection" is
not quite the same, but let's imagine that you at least have a similar
setup, where people *only* ever merge into the mainline by using
--no-ff.  In that case, you'd like a bisect operation that *starts* by
using --first-parent, which will tell you which merge caused the
problem.  After that, you might want to bisect into the branch.

(I don't actually remember if 'git bisect' understands --first-parent
correctly.  gitbuilder doesn't exactly bisect either, but that's
another story and not relevant right now.)

I can actually imagine that there are many more projects that do what
I'm talking about - "messy" branches that get broken and fixed over
time, then merge into a "clean" mainline - than projects (like the
kernel and git.git) that try to keep all branches clean at all times.
Thus, I could see some argument that a "--first-parents first"
bisection would actually help out a lot of people, and maybe even
deserves to be the default.

I don't really care though, I just use gitbuilder :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 20:04         ` Junio C Hamano
@ 2011-01-25  3:27           ` Shuang He
  0 siblings, 0 replies; 14+ messages in thread
From: Shuang He @ 2011-01-25  3:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Christian Couder, git@vger.kernel.org,
	apenwarr@gmail.com

On 2011/1/25 4:04, Junio C Hamano wrote:
> Shuang He<shuang.he@intel.com>  writes:
>
>> If A is bad commit, and C fixed it, and then F is bad again,
>>
>> A ->   B ->   C ->   D ->   E ->   F ->   G ->   H   (master)
>>    \                    \      /
>>      a  ->   b... c ->   d ->   e->f  (feature 1)
>>
>> Start with H as bad commit, and D as good commit, it's possible git-bisect would jump to c, and it will lead to wrong direction
>>
>> If bad-branch-first is used, it would be:
>> 1. first round found F
>> 2. end
> It is unclear from the way you drew the picture if "F" is supposed to be a
> merge of "E" and "f", but I'd assume that it is.

Oh, I lost this mail
That graph is different from what I meant, when shown in different email 
client.
It's G which is merged from e and F

> So what you are saying in 1. is "skip from H until you hit a first merge
> (without testing any intermediate commit), find F and stop to check it,
> and find that it is broken".
>
> What makes you decide "2. end"?  The fact that both of its parents "E" and
> "f" are Ok?  IOW, it won't be "2. end" if one of the parents of the merge
> is broken?

I think the correction above should have answer those two questions.

> What if there is _no_ merge from a side branch but there were breakages in
> A (fixed in C) and then F in your original picture, i.e.
>
>    A---B---C---D---E---F---G---H (broken)
>    x       o           x
>
> and you are hunting for the bug starting from H?  How does your algorithm
> help?  I grossed over the linear part by saying "skip from H until you hit
> a first merge", but in general, what is your plan to handle linear part of
> the history?

If the history is linear, the new algorithm won't help, it will just 
behavior like default git-bisect algorithm.

> A totally unacceptable answer is "It does not help linear case, but it
> helps when there are merges".  The a-thru-f side branch in your picture,
> or any "culprit side branch that was merged" your algorithm finds in
> general, would eventually have a linear segment, and having x-o-x in the
> history fundmentally breaks "bisect"---your band-aid will not help.
>
> The whole idea behind using "bisect" to gain efficiency in isolating the
> issue depends on "Once you see a Good commit, you do not have to search
> beyond its ancestors", as it is to look for a single breakage that
> persists to the "Bad" commit you give, and as far as "bisect" is
> concerned, the breakage at A in your example is an unrelated breakage that
> did not persist through the history to the "Bad" commit H.

In the example above (after we know G is merged from e and F),
Those commits are old bad commit: A, B, a, b, ..., c, d (but we don't 
care about those old bad commits, we cared about latest bad commit that 
we met which is F)
It's possible that default git-bisect would jump to these old bad 
commits, and will finally find an old first bad commit
With bad-branch-first, it could help us to get away from the trouble 
that old culprit commit exist on feature1 branch for a period of time 
not fixed

Thanks
     --Shuang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 10:30   ` Shuang He
  2011-01-24 10:50     ` Johannes Sixt
@ 2011-01-25  9:20     ` Christian Couder
  2011-01-26  7:22       ` Shuang He
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Couder @ 2011-01-25  9:20 UTC (permalink / raw)
  To: Shuang He
  Cc: git@vger.kernel.org, apenwarr@gmail.com, Junio C Hamano,
	Johannes Sixt

On Mon, Jan 24, 2011 at 11:30 AM, Shuang He <shuang.he@intel.com> wrote:
> On 2011/1/24 17:53, Christian Couder wrote:
>>
>> Hi,
>>
>> On Mon, Jan 24, 2011 at 3:03 AM, Shuang He<shuang.he@intel.com>  wrote:
>>>
>>> Hi
>>>     The default git-bisect algorithm will jump around the commit tree,
>>> on the purpose of taking least steps to find the first culprit commit.
>>> We may find it sometime would locate a old culprit commit that we're not
>>> concerned about anymore.
>>
>> Yes, it can be a problem.
>
> I'm honored to be given so much comment :)
> Thank you

I am honored by your interest in git bisect and the fact that you
provided a patch :-)
Thanks!


>> If the quality of these branches is too bad, I think they should not
>> have been merged in the first place.
>> If they are not merged (and not marked as good), then git bisect will
>> not look at them, since it will look only at commits that are
>> ancestors of the bad commit it is given.
>>
>> Or if one is merged but it causes too many problems, then perhaps a
>> replacement commit could be used to unmerge the branch.
>>
>> Another possibility is to have in a file a list of commits that are
>> the last commits on these branches before the merge commits, and do a:
>>
>> git bisect good $(cat good_commits_file.txt)
>>
>> at the beginning of each bisection.
>>
>> So I think the long term solution in this case is not what your are
>> suggesting.
>
> Yeah, I agree that the issue I addressed above will not be a problem if all
> those branches are maintained very well.
> Actually we've implemented a automated bisect system for Intel Linux
> Graphics Driver Project, and so we'd like the system
> helps us to locate issue in an more automatic way when branches are not
> maintained as good as expected.

I think there is always a price to pay when you bisect if the branches
are not well maintained.
Maybe your algorithm could help in some cases, but my opinion is that
there will probably still be many problems and a human will often have
to take a look.

>>>         2. Some of those branches may not synchronized with main
>>> branch in time.  Say feature1 is broken when feature2 branch is created,
>>> and
>>> feature1 is fixed just a moment later after feature2 branch is created,
>>> and when feature2's development is done, and developer want to merge
>>> feature2 branch back to master branch, feature2 will be firstly
>>> synchronized to master branch tip, then merge into master.  For the same
>>> reason addressed in issue 1, this will also lead git-bisect into wrong
>>> direction.
>>
>> I am not sure what you mean by " feature2 will be firstly synchronized
>> to master branch tip", and I think this should mean a rebase that
>> would fix the bug if feature1 has already been merged into the master
>> branch.
>>
>> But anyway in this case, I think that git bisect will find that the
>> first bad commit is the last commit in the branch, just before it was
>> merged. And by looking at the branch graph it should be quite easy to
>> understand what happened.

Now I think I was wrong here, as git bisect will probably find that
the first commit in the branch (not the last one) is the first bad
commit.

[...]

>> - the name "bisectbadbranchfirst" seems wrong to me, because git
>> branches are just some special tags; "firstparentsonly" would be a
>> better name,
>
> It's recursively applying bad branch first algorithm, not just constantly
> stick to first parent.
> Given this condition:
>    A -> B -> C -> D -> E -> F -> G -> H   (master)
>         \ a  -> b -> c -> d -> e /  (feature 1)
>              \ x -> y -> z/      (feature 2)
> start with H as bad commit, and A as good commit, if y is the target bad
> commit. bad-branch-first algorithm will do it like this:
>    1. In first round stick to master branch, so it will locate G as first
> bad commit
>    2. In second round stick to feature1 branch, then it will locate d as
> first bad commit
>    3. In third round stick to feature2 branch, then it will finally locate y
> as first bad commit
> So you could see, it's always sticking to branch where current bad commit
> sit

I see. It is interesting, but why not develop a "firstparentsonly"
algorithm first?

As Avery explains in his email, it is already interesting to have a
"firstparentsonly" algorithm because some people are only interested
to know from which branch the bug comes from.
When they know that, they can just contact the relevant people and be
done with it.

And when we have a "firstparentsonly" algorithm, then your algorithm
could be just a script that repeatedly uses git bisect with the
"firstparentsonly" algorithm. And this script might be integrated in
the "contrib" directory if it not considered important to be
integrated as an algorithm into git bisect.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-24 20:28   ` Avery Pennarun
@ 2011-01-26  7:11     ` Shuang He
  0 siblings, 0 replies; 14+ messages in thread
From: Shuang He @ 2011-01-26  7:11 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Christian Couder, git@vger.kernel.org, Johannes Sixt,
	Junio C Hamano

On 2011/1/25 4:28, Avery Pennarun wrote:
> On Mon, Jan 24, 2011 at 1:53 AM, Christian Couder
> <christian.couder@gmail.com>  wrote:
>> I am not opposed to an option to bisect on the first parents of the
>> bad commit only. And after a very fast look at your patch it seems to
>> be what it does. By the way Avery Pennarun's gitbuilder
>> (https://github.com/apenwarr/gitbuilder) does the same thing. So I
>> know some people are interested in such a feature.
> Just some notes on gitbuilder's algorithm, since I haven't spent the
> time to fully understand Shuang's proposal.
>
> I do understand at least one of his concerns, that is, that people
> like to do a lot of "messy" development on a branch, and when the
> branch is done, merge the whole messy branch into the "mainline".  The
> messy branch would then have a lot of commits that break a lot of
> things before fixing them again later.
>
> In a corporate environment, this method allows people to work all day,
> make frequent commits, pull from other branches at will, and never
> risk their lives by doing poorly-educated rebases.  It works pretty
> well *until* you try to bisect, at which time all these messy commits
> start to bite you.
>
> gitbuilder's bisection is a total hack around this situation, although
> it happens to work perfectly in the workflow it was designed for, thus
> making me feel clever.
>
> Basically, we push/fetch *all* the branches from *everybody* into a
> single repo, and build all of them as frequently as we can.  If you
> think about it, if you have all the branches that someone might have
> pulled/merged from, then you don't have to think of the git history as
> a whole complicated DAG; you can just think of it as a whole bunch of
> separate chunks of linear history.  Moreover, as long as people are
> careful to only pull from a branch when that branch is passing all
> tests - which you can easily see by looking at the gitbuilder console
> - then playing inside each of these chunks of linear history can help
> you figure out where particular bugs were introduced during "messy"
> branches.
>
> It also allows you a nice separation of concerns.  The owner of the
> mainline branch (the "integration manager" person) only really cares
> about which branch they merged that caused a problem, because that
> person doesn't want to fix bugs, he/she simply wants to know who owns
> the failing branch, so that person can fix *their* bug and their
> branch will merge without breaking things.
>
> So this is why gitbuilder uses "git rev-list --first-parent" during
> its "fake bisection" operation: because a different person is
> responsible for each "linear chunk" of history.
>
> Note that you have to use --no-ff when merging if you want this to
> work reliably.  But the build manager person can just remember to do
> that.  Combining --no-ff and --ff-only (which sound mutually exclusive
> but aren't) is a way to be extra specially sure.
>
> Now, if you aren't using gitbuilder, what we want from "bisection" is
> not quite the same, but let's imagine that you at least have a similar
> setup, where people *only* ever merge into the mainline by using
> --no-ff.  In that case, you'd like a bisect operation that *starts* by
> using --first-parent, which will tell you which merge caused the
> problem.  After that, you might want to bisect into the branch.
>
> (I don't actually remember if 'git bisect' understands --first-parent
> correctly.  gitbuilder doesn't exactly bisect either, but that's
> another story and not relevant right now.)
>
> I can actually imagine that there are many more projects that do what
> I'm talking about - "messy" branches that get broken and fixed over
> time, then merge into a "clean" mainline - than projects (like the
> kernel and git.git) that try to keep all branches clean at all times.
> Thus, I could see some argument that a "--first-parents first"
> bisection would actually help out a lot of people, and maybe even
> deserves to be the default.
>
> I don't really care though, I just use gitbuilder :)
>
> Have fun,
>
> Avery

Thanks for helping explaining those stuff, and also glad to learn more 
about gitbuilder :)

Thanks
     --Shuang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-25  9:20     ` Christian Couder
@ 2011-01-26  7:22       ` Shuang He
  2011-01-26  9:44         ` Christian Couder
  0 siblings, 1 reply; 14+ messages in thread
From: Shuang He @ 2011-01-26  7:22 UTC (permalink / raw)
  To: Christian Couder
  Cc: git@vger.kernel.org, apenwarr@gmail.com, Junio C Hamano,
	Johannes Sixt

On 2011/1/25 17:20, Christian Couder wrote:
> On Mon, Jan 24, 2011 at 11:30 AM, Shuang He<shuang.he@intel.com>  wrote:
>> On 2011/1/24 17:53, Christian Couder wrote:
>>> Hi,
>>>
>>> On Mon, Jan 24, 2011 at 3:03 AM, Shuang He<shuang.he@intel.com>    wrote:
>>>> Hi
>>>>      The default git-bisect algorithm will jump around the commit tree,
>>>> on the purpose of taking least steps to find the first culprit commit.
>>>> We may find it sometime would locate a old culprit commit that we're not
>>>> concerned about anymore.
>>> Yes, it can be a problem.
>> I'm honored to be given so much comment :)
>> Thank you
> I am honored by your interest in git bisect and the fact that you
> provided a patch :-)
> Thanks!

I'm glad to see that git community is so hot.

>
>>> If the quality of these branches is too bad, I think they should not
>>> have been merged in the first place.
>>> If they are not merged (and not marked as good), then git bisect will
>>> not look at them, since it will look only at commits that are
>>> ancestors of the bad commit it is given.
>>>
>>> Or if one is merged but it causes too many problems, then perhaps a
>>> replacement commit could be used to unmerge the branch.
>>>
>>> Another possibility is to have in a file a list of commits that are
>>> the last commits on these branches before the merge commits, and do a:
>>>
>>> git bisect good $(cat good_commits_file.txt)
>>>
>>> at the beginning of each bisection.
>>>
>>> So I think the long term solution in this case is not what your are
>>> suggesting.
>> Yeah, I agree that the issue I addressed above will not be a problem if all
>> those branches are maintained very well.
>> Actually we've implemented a automated bisect system for Intel Linux
>> Graphics Driver Project, and so we'd like the system
>> helps us to locate issue in an more automatic way when branches are not
>> maintained as good as expected.
> I think there is always a price to pay when you bisect if the branches
> are not well maintained.
> Maybe your algorithm could help in some cases, but my opinion is that
> there will probably still be many problems and a human will often have
> to take a look.
>

Yes, I agree. What we trying to do is just make the machine to do more 
help for human.

>>>>          2. Some of those branches may not synchronized with main
>>>> branch in time.  Say feature1 is broken when feature2 branch is created,
>>>> and
>>>> feature1 is fixed just a moment later after feature2 branch is created,
>>>> and when feature2's development is done, and developer want to merge
>>>> feature2 branch back to master branch, feature2 will be firstly
>>>> synchronized to master branch tip, then merge into master.  For the same
>>>> reason addressed in issue 1, this will also lead git-bisect into wrong
>>>> direction.
>>> I am not sure what you mean by " feature2 will be firstly synchronized
>>> to master branch tip", and I think this should mean a rebase that
>>> would fix the bug if feature1 has already been merged into the master
>>> branch.
>>>
>>> But anyway in this case, I think that git bisect will find that the
>>> first bad commit is the last commit in the branch, just before it was
>>> merged. And by looking at the branch graph it should be quite easy to
>>> understand what happened.
> Now I think I was wrong here, as git bisect will probably find that
> the first commit in the branch (not the last one) is the first bad
> commit.
>
> [...]
>
>>> - the name "bisectbadbranchfirst" seems wrong to me, because git
>>> branches are just some special tags; "firstparentsonly" would be a
>>> better name,
>> It's recursively applying bad branch first algorithm, not just constantly
>> stick to first parent.
>> Given this condition:
>>     A ->  B ->  C ->  D ->  E ->  F ->  G ->  H   (master)
>>          \ a  ->  b ->  c ->  d ->  e /  (feature 1)
>>               \ x ->  y ->  z/      (feature 2)
>> start with H as bad commit, and A as good commit, if y is the target bad
>> commit. bad-branch-first algorithm will do it like this:
>>     1. In first round stick to master branch, so it will locate G as first
>> bad commit
>>     2. In second round stick to feature1 branch, then it will locate d as
>> first bad commit
>>     3. In third round stick to feature2 branch, then it will finally locate y
>> as first bad commit
>> So you could see, it's always sticking to branch where current bad commit
>> sit
> I see. It is interesting, but why not develop a "firstparentsonly"
> algorithm first?
>
> As Avery explains in his email, it is already interesting to have a
> "firstparentsonly" algorithm because some people are only interested
> to know from which branch the bug comes from.
> When they know that, they can just contact the relevant people and be
> done with it.
>
> And when we have a "firstparentsonly" algorithm, then your algorithm
> could be just a script that repeatedly uses git bisect with the
> "firstparentsonly" algorithm. And this script might be integrated in
> the "contrib" directory if it not considered important to be
> integrated as an algorithm into git bisect.

Sorry to reply so late, since I was on a long journey home for Chinese 
New Year vacation ;)
I agree that's also an good option.
Is it acceptable to add option to git-bisect stuff, so user could choose 
which algorithm to use at every step at will.
And we have tested previous attached patch with t6002-rev-list-bisect.sh 
and t6030-bisect-porcelain.sh, and we get:
     with bad-branch-first disabled (which is the default setting):
         t6002-rev-list-bisect.sh: # passed all 45 test(s)
         t6030-bisect-porcelain.sh: # passed all 40 test(s)
     and with bad-branch-first enabled:
         t6002-rev-list-bisect.sh: # passed all 45 test(s)
         t6030-bisect-porcelain.sh: # failed 5 among 40 test(s), and I 
have spent some time digging into those failures ,and it seems they're 
all false negative since they're using hard-coded bisect path to 
validate specific case

Thanks
     --Shuang
> Thanks,
> Christian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-26  7:22       ` Shuang He
@ 2011-01-26  9:44         ` Christian Couder
  2011-01-26 10:40           ` Shuang He
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Couder @ 2011-01-26  9:44 UTC (permalink / raw)
  To: Shuang He
  Cc: git@vger.kernel.org, apenwarr@gmail.com, Junio C Hamano,
	Johannes Sixt

On Wed, Jan 26, 2011 at 8:22 AM, Shuang He <shuang.he@intel.com> wrote:
> On 2011/1/25 17:20, Christian Couder wrote:
>>
>>>
>>> Yeah, I agree that the issue I addressed above will not be a problem if
>>> all
>>> those branches are maintained very well.
>>> Actually we've implemented a automated bisect system for Intel Linux
>>> Graphics Driver Project, and so we'd like the system
>>> helps us to locate issue in an more automatic way when branches are not
>>> maintained as good as expected.
>>
>> I think there is always a price to pay when you bisect if the branches
>> are not well maintained.
>> Maybe your algorithm could help in some cases, but my opinion is that
>> there will probably still be many problems and a human will often have
>> to take a look.
>>
>
> Yes, I agree. What we trying to do is just make the machine to do more help
> for human.

Yeah, this is the way to go. And by the way I am happy to know that
you have implemented an automated bisect system. That's great and I
hope it already helps.

>>>> - the name "bisectbadbranchfirst" seems wrong to me, because git
>>>> branches are just some special tags; "firstparentsonly" would be a
>>>> better name,
>>>
>>> It's recursively applying bad branch first algorithm, not just constantly
>>> stick to first parent.
>>> Given this condition:
>>>    A ->  B ->  C ->  D ->  E ->  F ->  G ->  H   (master)
>>>         \ a  ->  b ->  c ->  d ->  e /  (feature 1)
>>>              \ x ->  y ->  z/      (feature 2)
>>> start with H as bad commit, and A as good commit, if y is the target bad
>>> commit. bad-branch-first algorithm will do it like this:
>>>    1. In first round stick to master branch, so it will locate G as first
>>> bad commit
>>>    2. In second round stick to feature1 branch, then it will locate d as
>>> first bad commit
>>>    3. In third round stick to feature2 branch, then it will finally
>>> locate y
>>> as first bad commit
>>> So you could see, it's always sticking to branch where current bad commit
>>> sit
>>
>> I see. It is interesting, but why not develop a "firstparentsonly"
>> algorithm first?
>>
>> As Avery explains in his email, it is already interesting to have a
>> "firstparentsonly" algorithm because some people are only interested
>> to know from which branch the bug comes from.
>> When they know that, they can just contact the relevant people and be
>> done with it.
>>
>> And when we have a "firstparentsonly" algorithm, then your algorithm
>> could be just a script that repeatedly uses git bisect with the
>> "firstparentsonly" algorithm. And this script might be integrated in
>> the "contrib" directory if it not considered important to be
>> integrated as an algorithm into git bisect.
>
> Sorry to reply so late, since I was on a long journey home for Chinese New
> Year vacation ;)

No problem. I am not in a hurry at all. In fact I don't have much time
these days so I reply very late too.

> I agree that's also an good option.
> Is it acceptable to add option to git-bisect stuff, so user could choose
> which algorithm to use at every step at will.

Are you sure it is needed to be able to change the algorithm at every step?

This means that you would like a new "git bisect strategy <strategy>"
subcommand ?

First I thought that we could just add a "--strategy <strategy>"
option to "git bisect start".
But anyway, I think it should be easy to add afterward, and it can be
done in a separated patch that can be discussed on its own.

> And we have tested previous attached patch with t6002-rev-list-bisect.sh and
> t6030-bisect-porcelain.sh, and we get:
>    with bad-branch-first disabled (which is the default setting):
>        t6002-rev-list-bisect.sh: # passed all 45 test(s)
>        t6030-bisect-porcelain.sh: # passed all 40 test(s)
>    and with bad-branch-first enabled:
>        t6002-rev-list-bisect.sh: # passed all 45 test(s)
>        t6030-bisect-porcelain.sh: # failed 5 among 40 test(s), and I have
> spent some time digging into those failures ,and it seems they're all false
> negative since they're using hard-coded bisect path to validate specific
> case

Yes, there are some hard coded commits that depend on the algorithm.
Anyway I did not look in depth at your patch yet, and as I said it
would be better if you could split it into a patch series where a
"firstparentsonly" algorithm is implemented first.
This way it will be easier to review, and we can start to integrate
some non controversial features, and then discuss the other ones on
their own merit.

Thanks in advance,
Christian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Add bad-branch-first option for git-bisect
  2011-01-26  9:44         ` Christian Couder
@ 2011-01-26 10:40           ` Shuang He
  0 siblings, 0 replies; 14+ messages in thread
From: Shuang He @ 2011-01-26 10:40 UTC (permalink / raw)
  To: Christian Couder
  Cc: git@vger.kernel.org, apenwarr@gmail.com, Junio C Hamano,
	Johannes Sixt

On 2011/1/26 17:44, Christian Couder wrote:
> On Wed, Jan 26, 2011 at 8:22 AM, Shuang He<shuang.he@intel.com>  wrote:
>> On 2011/1/25 17:20, Christian Couder wrote:
>>>> Yeah, I agree that the issue I addressed above will not be a problem if
>>>> all
>>>> those branches are maintained very well.
>>>> Actually we've implemented a automated bisect system for Intel Linux
>>>> Graphics Driver Project, and so we'd like the system
>>>> helps us to locate issue in an more automatic way when branches are not
>>>> maintained as good as expected.
>>> I think there is always a price to pay when you bisect if the branches
>>> are not well maintained.
>>> Maybe your algorithm could help in some cases, but my opinion is that
>>> there will probably still be many problems and a human will often have
>>> to take a look.
>>>
>> Yes, I agree. What we trying to do is just make the machine to do more help
>> for human.
> Yeah, this is the way to go. And by the way I am happy to know that
> you have implemented an automated bisect system. That's great and I
> hope it already helps.
>
>>>>> - the name "bisectbadbranchfirst" seems wrong to me, because git
>>>>> branches are just some special tags; "firstparentsonly" would be a
>>>>> better name,
>>>> It's recursively applying bad branch first algorithm, not just constantly
>>>> stick to first parent.
>>>> Given this condition:
>>>>     A ->    B ->    C ->    D ->    E ->    F ->    G ->    H   (master)
>>>>          \ a  ->    b ->    c ->    d ->    e /  (feature 1)
>>>>               \ x ->    y ->    z/      (feature 2)
>>>> start with H as bad commit, and A as good commit, if y is the target bad
>>>> commit. bad-branch-first algorithm will do it like this:
>>>>     1. In first round stick to master branch, so it will locate G as first
>>>> bad commit
>>>>     2. In second round stick to feature1 branch, then it will locate d as
>>>> first bad commit
>>>>     3. In third round stick to feature2 branch, then it will finally
>>>> locate y
>>>> as first bad commit
>>>> So you could see, it's always sticking to branch where current bad commit
>>>> sit
>>> I see. It is interesting, but why not develop a "firstparentsonly"
>>> algorithm first?
>>>
>>> As Avery explains in his email, it is already interesting to have a
>>> "firstparentsonly" algorithm because some people are only interested
>>> to know from which branch the bug comes from.
>>> When they know that, they can just contact the relevant people and be
>>> done with it.
>>>
>>> And when we have a "firstparentsonly" algorithm, then your algorithm
>>> could be just a script that repeatedly uses git bisect with the
>>> "firstparentsonly" algorithm. And this script might be integrated in
>>> the "contrib" directory if it not considered important to be
>>> integrated as an algorithm into git bisect.
>> Sorry to reply so late, since I was on a long journey home for Chinese New
>> Year vacation ;)
> No problem. I am not in a hurry at all. In fact I don't have much time
> these days so I reply very late too.
>
>> I agree that's also an good option.
>> Is it acceptable to add option to git-bisect stuff, so user could choose
>> which algorithm to use at every step at will.
> Are you sure it is needed to be able to change the algorithm at every step?

I don't think it's needed, it would just give user more control over the 
algorithm.

> This means that you would like a new "git bisect strategy<strategy>"
> subcommand ?
>
> First I thought that we could just add a "--strategy<strategy>"
> option to "git bisect start".
> But anyway, I think it should be easy to add afterward, and it can be
> done in a separated patch that can be discussed on its own.

Yeah, agree. We could discuss this later

>> And we have tested previous attached patch with t6002-rev-list-bisect.sh and
>> t6030-bisect-porcelain.sh, and we get:
>>     with bad-branch-first disabled (which is the default setting):
>>         t6002-rev-list-bisect.sh: # passed all 45 test(s)
>>         t6030-bisect-porcelain.sh: # passed all 40 test(s)
>>     and with bad-branch-first enabled:
>>         t6002-rev-list-bisect.sh: # passed all 45 test(s)
>>         t6030-bisect-porcelain.sh: # failed 5 among 40 test(s), and I have
>> spent some time digging into those failures ,and it seems they're all false
>> negative since they're using hard-coded bisect path to validate specific
>> case
> Yes, there are some hard coded commits that depend on the algorithm.
> Anyway I did not look in depth at your patch yet, and as I said it
> would be better if you could split it into a patch series where a
> "firstparentsonly" algorithm is implemented first.
> This way it will be easier to review, and we can start to integrate
> some non controversial features, and then discuss the other ones on
> their own merit.
>
> Thanks in advance,
> Christian.

Thanks for the good suggestion, I'll start the work soon.

Thanks
     --Shuang

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-26 10:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24  2:03 [RFC] Add bad-branch-first option for git-bisect Shuang He
2011-01-24  2:05 ` [PATCH] add config option core.bisectbadbranchfirst Shuang He
2011-01-24  9:53 ` [RFC] Add bad-branch-first option for git-bisect Christian Couder
2011-01-24 10:30   ` Shuang He
2011-01-24 10:50     ` Johannes Sixt
2011-01-24 11:05       ` Shuang He
2011-01-24 20:04         ` Junio C Hamano
2011-01-25  3:27           ` Shuang He
2011-01-25  9:20     ` Christian Couder
2011-01-26  7:22       ` Shuang He
2011-01-26  9:44         ` Christian Couder
2011-01-26 10:40           ` Shuang He
2011-01-24 20:28   ` Avery Pennarun
2011-01-26  7:11     ` Shuang He

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).