From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] bisect: loosen halfway() check for a large number of commits
Date: Thu, 22 Oct 2020 12:40:07 +0200 [thread overview]
Message-ID: <20201022104007.GE24813@szeder.dev> (raw)
In-Reply-To: <20201022103806.26680-1-szeder.dev@gmail.com>
This is only RFC, but forgot to mark it as such in Subject: line,
sorry.
On Thu, Oct 22, 2020 at 12:38:06PM +0200, SZEDER Gábor wrote:
> 'git bisect start ...' and subsequent 'git bisect (good|bad)' commands
> can take quite a while when the given/remaining revision range between
> good and bad commits is big and contains a lot of merge commits, e.g.
> in git.git:
>
> $ git rev-list --count v1.6.0..v2.28.0
> 44284
> $ time git bisect start v2.28.0 v1.6.0
> Bisecting: 22141 revisions left to test after this (roughly 15 steps)
> [e197c21807dacadc8305250baa0b9228819189d4] unable_to_lock_die(): rename function from unable_to_lock_index_die()
>
> real 0m15.472s
> user 0m15.220s
> sys 0m0.255s
>
> The majority of the runtime is spent in do_find_bisection(), where we
> try to find a commit as close as possible to the halfway point between
> the bad and good revisions, i.e. a commit from which the number of
> reachable commits that are in the good-bad range is half the total
> number of commits in that range. So we count how many commits are
> reachable in the good-bad range for each commit in that range, which
> is quick and easy for a linear history, even over 300k commits in a
> linear range are handled in ~0.3s on my machine. Alas, handling merge
> commits is non-trivial and quite expensive as the algorithm used seems
> to be quadratic, causing the long runtime shown above.
>
> Interestingly, look at what a big difference one additional commit
> can make:
>
> $ git rev-list --count v1.6.0^..v2.28.0
> 44285
> $ time git bisect start v2.28.0 v1.6.0^
> Bisecting: 22142 revisions left to test after this (roughly 15 steps)
> [565301e41670825ceedf75220f2918ae76831240] Sync with 2.1.2
>
> real 0m5.848s
> user 0m5.600s
> sys 0m0.252s
>
> The difference is caused by one of the optimizations attempting to cut
> down the runtime added in 1c4fea3a40 (git-rev-list --bisect:
> optimization, 2007-03-21):
>
> Another small optimization is whenever we find a half-way commit
> (that is, a commit that can reach exactly half of the commits),
> we stop giving counts to remaining commits, as we will not find
> any better commit than we just found.
>
> In this second 'git bisect start' command we happen to find a commit
> exactly at the halfway point and can return early, but in the first
> case there is no such commit, so we can't return early and end up
> counting the number of reachable commits from all commits in the
> good-bad range.
>
> However, when we have thousands of commits it's not all that important
> to find the _exact_ halfway point, a few commits more or less doesn't
> make any real difference for the bisection.
>
> So let's loosen the halfway check to consider commits within about
> 0.1% of the exact halfway point as halfway as well. This will allow
> us to return early on a bigger good-bad range, even when there is no
> commit exactly at the halfway point, thereby reducing the runtime of
> the first command above considerably, from ~15s to 4.901s.
> Furthermore, even if there is a commit exactly at the halfway point,
> we might still stumble upon a commit within that 0.1% range before
> finding the exact halfway point, allowing us to return a bit earlier,
> slightly reducing the runtime of the second command from 5.848s to
> 5.058s. Note that this change doesn't affect good-bad ranges
> containing ~2000 commits or less, because that 0.1% tolerance becomes
> zero due to integer arithmetic; however, if the range is that small
> then counting the reachable commits for all commits is already fast
> enough anyway.
>
> Naturally, this will likely change which commits get picked at each
> bisection step, and, in turn, might change how many bisection steps
> are necessary to find the first bad commit. If the number of
> necessary bisection steps were to increase often, then this change
> could backfire, because building and testing at each step might take
> much longer than the time spared. OTOH, if the number of steps were
> to decrease, then it would be a double win.
>
> So I ran some tests to see how often that happens: picked random good
> and bad starting revisions at least 50k commits apart and a random
> first bad commit in between in git.git, and used 'git bisect run git
> merge-base --is-ancestor HEAD $first_bad_commit' to check the number
> of necessary bisection steps. After repeating all this 1000 times
> both with and without this patch I found that:
>
> - 146 cases needed one more bisection step than before, 149 cases
> needed one less step, while in the remaining 705 cases the number
> of steps didn't change. So the number of bisection steps does
> indeed change in a non-negligible number of cases, but it seems
> that the average number of steps doesn't change in the long run.
>
> - The first 'git bisect start' command got over 3x faster in 456
> cases, so this "no commit at the exact halfway point" case seems
> to be common enough to care about.
>
> [TODO:
> - Update comments at callsites mentioning "exact halfway".
> - Rename function to approx_halfway(), perhaps?]
> ---
> bisect.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index f5b1368128..1857ce4c75 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -105,6 +105,8 @@ static int count_interesting_parents(struct commit *commit, unsigned bisect_flag
>
> static inline int halfway(struct commit_list *p, int nr)
> {
> + int diff;
> +
> /*
> * Don't short-cut something we are not going to return!
> */
> @@ -113,13 +115,22 @@ static inline int halfway(struct commit_list *p, int nr)
> if (DEBUG_BISECT)
> return 0;
> /*
> - * 2 and 3 are halfway of 5.
> + * For small number of commits 2 and 3 are halfway of 5, and
> * 3 is halfway of 6 but 2 and 4 are not.
> */
> - switch (2 * weight(p) - nr) {
> + diff = 2 * weight(p) - nr;
> + switch (diff) {
> case -1: case 0: case 1:
> return 1;
> default:
> + /*
> + * For large number of commits we are not so strict, it's
> + * good enough if it's within ~0.1% of the halfway point,
> + * e.g. 5000 is exactly halfway of 10000, but we consider
> + * the values [4996, 5004] as halfway as well.
> + */
> + if (abs(diff) < nr / 1024)
> + return 1;
> return 0;
> }
> }
> --
> 2.29.0.470.g6462f21d4e
>
next prev parent reply other threads:[~2020-10-22 10:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-22 10:38 [PATCH] bisect: loosen halfway() check for a large number of commits SZEDER Gábor
2020-10-22 10:40 ` SZEDER Gábor [this message]
2020-10-22 17:18 ` Junio C Hamano
2020-10-24 7:41 ` Christian Couder
2020-10-25 18:01 ` SZEDER Gábor
2020-11-12 16:19 ` [PATCH v2] " SZEDER Gábor
2020-11-12 18:23 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201022104007.GE24813@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).