git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Alex Hoffman <spec@gal.ro>
Cc: Johannes Sixt <j6t@kdbg.org>, Stephan Beyer <s-beyer@gmx.net>,
	git <git@vger.kernel.org>
Subject: Re: Git bisect does not find commit introducing the bug
Date: Sun, 19 Feb 2017 14:07:10 +0100	[thread overview]
Message-ID: <CAP8UFD2R94sPCd5i8NF1oZn+g8X6oYRqP7qYftmny2iXwh59Hw@mail.gmail.com> (raw)
In-Reply-To: <CAMX8fZVeAEJ5tfCO_4Pebnq=rysaJ2xDMjH-9pjmPeF4FziLFw@mail.gmail.com>

On Sun, Feb 19, 2017 at 12:32 PM, Alex Hoffman <spec@gal.ro> wrote:
>> At the end of the git-bisect man page (in the SEE ALSO section) there
>> is a link to https://github.com/git/git/blob/master/Documentation/git-bisect-lk2009.txt
>> which has a lot of details about how bisect works.
>
> Thanks for pointing out the SEE ALSO section. I think it makes sense
> to include/describe the entire algorithm in the man page itself,
> although I am not sure whether the graphs would be always correctly
> visually represented in the man page format.

It would possibly be very long to describe the entire algorithm, as it
can be quite complex in some cases and it is difficult to understand
without graphs. Maybe we could describe it, or some parts of it, in a
separate document and provide links at different places in the man
page.
Anyway feel free to send patches.

>> The goal is to find the first bad commit, which is a commit that has
>> only good parents.
>
> OK, bisect's mission is more exact than I thought, which is good. M

Good that you seem to agree with this goal.

>> As o1 is an ancestor of G, then o1 is considered good by the bisect algorithm.
>> If it was bad, it would means that there is a transition from bad to
>> good between o1 and G.
>> But when a good commit is an ancestor of the bad commit, git bisect
>> makes the assumption that there is no transition from bad to good in
>> the graph.
>
> The assumption that there is no transition from bad to good in the
> graph did not hold in my example and it does not hold when a feature
> was recently introduced and gets broken relative shortly afterwards.
> But I consider it is easy to change the algorithm not to assume, but
> rather to check it.

I don't think the default algorithm will change soon, but there have
been discussions for a long time about adding options to use different
algorithms.

For example people have been discussing a "--first-parent" option for
many years as well as recently. It would bisect only along the first
parents of the involved commits, and it could help find the merge
commit that introduced a bug in the mainline.

>> git bisect makes some assumptions that are true most of the time, so
>> in practice it works well most of the time.
>
> Whatever the definition of "most of the time" everyone might have, I
> think there is room for improvement.

So feel free to send patches that would implement an option with the
improvements you want.

> Below I am trying to make a small
> change to the current algorithm in order to deal with the assumption
> that sometimes does not hold (e.g in my example), by explicitly
> validating the check.
>
>> --o1--o2--o3--G--X1
>>     \                \
>>      x1--x2--x3--x4--X2--B--
>>       \              /
>>        y1--y2--y3
>
> Step 1a. (Unchanged) keep only the commits that:
>
>         a) are ancestor of the "bad" commit (including the "bad" commit itself),
>         b) are not ancestor of a "good" commit (excluding the "good" commits).
>
> The following graph results:
>       x1--x2--x3--x4--X2--B--
>        \              /
>         y1--y2--y3

I would say that the above graph is missing X1.

> Step 1b. (New) Mark all root commits of the resulting graph (i.e
> commits without parents) as unconfirmed (unconfirmed=node that has
> only bad parents). Remove all root commits that user already confirmed
> (e.g if user already marked its parent as good right before starting
> bisect run). For every unconfirmed root commit check if it has any
> good parents. In the example above check whether x1 has good parents.

I think I understand the above...

>      If the current root element has any parents and none of them is
> good, we can delete all paths from it until to the next commit that
> has a parent in the ancestors of GOOD. In the example above to delete
> the path x1-x3 and x1-y3. Also add new resulting root commits to the
> list of unconfirmed commits (commit x4).
>      Otherwise mark it as confirmed.

... but I don't understand the logic of the above. If the root element
has a bad parent, then it means that the "first bad commit" is either
the bad parent or one of its ancestors, so it is not logical to delete
it. In your example if x1 has one bad parent, this bad parent and its
ancestors should be included in the search of the first bad commit.

Otherwise the goal is not any more to find the first bad commit.

PS: I saw that you have just sent another version of the algorithm,
but I don't want to take a look at it right now. Anyway I am keeping
my above comments as they might still be useful.

> Step2. Continue the existing algorithm.
>
>
> If this improvement works (i.e you do not find any bugs in it and it
> is feasible to implement, which seems to me)

As you describe it, I don't think it is compatible with the goal of
finding the first bad commit.
Also there are many things that are feasible to implement, but it
doesn't mean that someone will soon make the effort to implement them
in a way that looks good enough to be deemed worth merging into the
current code base.

> the following would be
> its advantages:
> 1. An assumption less, as we explicitly check the assumption.

Checking can be costly. If the probability that the check will fail is
very low, while the cost of checking is high, it is less costly on
average to not check.

> 2. It might be quicker, because we delete parts of graph that cannot
> contain transitions.

I don't agree that it's a good idea to delete what you suggest above.
Or if you think that the goal should not be to find the "first bad
commit" in the above case, then you should explain what the goal
should be.

> 3. It returns more exact results.

Yeah, but checking every commit related to the good and bad commits
would also return more exact results. (This can probably be done using
`git rebase --exec ...` by the way.) One could even print a nice graph
with all the good and bad commits. The problem is that it would not be
efficient. If git bisect makes some assumptions, it is because they
have been deemed reasonable and they have worked well in practice.
It's also because the goal of git bisect is to be efficient, otherwise
there would be no point in using a binary search algorithm in the
first place.

  parent reply	other threads:[~2017-02-19 13:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 22:29 Git bisect does not find commit introducing the bug Alex Hoffman
2017-02-17 23:21 ` Stephan Beyer
2017-02-18  9:12   ` Johannes Sixt
2017-02-18 11:15     ` Alex Hoffman
2017-02-18 14:18       ` Johannes Sixt
2017-02-18 18:36         ` Alex Hoffman
2017-02-18 19:58           ` Christian Couder
2017-02-19 11:32             ` Alex Hoffman
2017-02-19 12:43               ` Alex Hoffman
2017-02-19 13:07               ` Christian Couder [this message]
2017-02-19 14:13               ` Johannes Sixt
2017-02-19 19:05                 ` Alex Hoffman
2017-02-19 19:25                   ` Jacob Keller
2017-02-20  7:38                     ` Oleg Taranenko
2017-02-20 12:27                       ` Jakub Narębski
2017-02-20 13:50                         ` Oleg Taranenko
2017-02-20 20:31                           ` Alex Hoffman
2017-02-20 20:35                             ` Jakub Narębski
2017-02-20 20:39                               ` Alex Hoffman
2017-02-20 22:24                               ` Philip Oakley
2017-02-21 19:40                                 ` Alex Hoffman
2017-02-21 22:39                                   ` Philip Oakley
2017-02-20  9:02             ` Junio C Hamano
2017-02-18 22:10           ` Philip Oakley
2017-02-18 22:36           ` Hilco Wijbenga
2017-02-18 22:37           ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD2R94sPCd5i8NF1oZn+g8X6oYRqP7qYftmny2iXwh59Hw@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=s-beyer@gmx.net \
    --cc=spec@gal.ro \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).