git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal
Date: Tue, 25 Apr 2023 14:06:25 -0400	[thread overview]
Message-ID: <f25c6234-83c0-fa49-85f5-9005e312b8a3@github.com> (raw)
In-Reply-To: <cover.1682380788.git.me@ttaylorr.com>

On 4/24/2023 8:00 PM, Taylor Blau wrote:
> Here is a short (but dense) series that I worked on towards the end of
> 2021 with Peff.
> 
> Its goal is to improve the bitmap traversal we implement in
> prepare_bitmap_walk(). Specifically, it avoids building up a complete
> bitmap of the "haves" side, and instead uses a combination of (a)
> UNINTERESTING commits between the tips and boundary, and (b) the
> boundary itself.
> 
> The gory details are laid out in 3/3, but the high-level overview of the
> new algorithm to compute the set of objects between "haves" and "wants"
> using bitmaps is:
> 
>   1. Build a (partial) bitmap of the haves side by first OR-ing any
>      bitmap(s) that already exist for UNINTERESTING commits between the
>      haves and the boundary.
> 
>   2. For each commit along the boundary, add it as a fill-in traversal
>      tip (where the traversal terminates once an existing bitmap is
>      found), and perform fill-in traversal.
> 
>   3. Build up a complete bitmap of the wants side as usual, stopping any
>      time we intersect the (partial) haves side.

In other words: this generates something closer to the object set in the
non-bitmapped object walk. The only difference is that the new bitmapped
algorithm will see objects that were re-introduced across the boundary
(say, a blob was reverted to its older mode).
 
> This improves many cases where using bitmaps was significantly *slower*
> than regular, non-bitmap traversal. In some instances, using bitmaps is
> still slower than the non-bitmap case, but it is a substantial
> improvement over the classic bitmap walk.> 
>     $ ours="$(git branch --show-current)"
>       argv="--count --objects $ours --not --exclude=$ours --branches"
>       hyperfine \
>         -n 'no bitmaps' "git.compile rev-list $argv" \
>         -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
>         -n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
>     Benchmark 1: no bitmaps
>       Time (mean ± σ):      15.1 ms ±   4.1 ms    [User: 8.1 ms, System: 6.9 ms]
>       Range (min … max):     7.4 ms …  21.8 ms    131 runs
> 
>     Benchmark 2: existing bitmap traversal
>       Time (mean ± σ):      82.6 ms ±   9.2 ms    [User: 63.6 ms, System: 19.0 ms]
>       Range (min … max):    73.8 ms … 105.4 ms    28 runs
> 
>     Benchmark 3: this commit
>       Time (mean ± σ):      19.8 ms ±   3.1 ms    [User: 13.0 ms, System: 6.8 ms]
>       Range (min … max):    17.7 ms …  38.6 ms    158 runs
> 
>     Summary
>       'no bitmaps' ran
>         1.31 ± 0.41 times faster than 'this commit'
>         5.49 ± 1.62 times faster than 'existing bitmap traversal'
> 
> In a large repository on GitHub.com, the timings to compute the objects
> unique to "master", as in:
> 
>     $ git rev-list --count --objects master --not --exclude=master --branches
> 
> improve as follows:
> 
>   - 1.012 sec (without bitmaps)
>   - 29.571 sec (with the existing bitmap walk)
>   - 2.279 sec (with these patches)

For my curiosity, and since you already have a test environment set up,
could you redo the "without bitmaps" case with pack.useSparse true and
false? When the option was created and defaulted to true, we never
really explored comparing it to the bitmap case. In fact, I assumed the
bitmap case was faster in important cases like this (where there is a
substantial difference in object counts), but your data is surprising me
that the sparse algorithm is outperforming bitmaps, even with this new
algorithm.

The main question I'd like to ask is: is pack.useSparse contributing
in an important way to the performance here?
 
I'll go poking into the patches now.

Thanks,
-Stolee

  parent reply	other threads:[~2023-04-25 18:07 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25  0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25  0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15   ` Derrick Stolee
2023-05-03 21:48     ` Taylor Blau
2023-05-04 13:46       ` Derrick Stolee
2023-05-03 22:08     ` Taylor Blau
2023-05-04 13:59       ` Derrick Stolee
2023-05-05 17:30         ` Taylor Blau
2023-04-25 18:22   ` Junio C Hamano
2023-04-25 18:48     ` Taylor Blau
2023-04-25  0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32   ` Junio C Hamano
2023-04-25 18:51     ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25  0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52   ` Junio C Hamano
2023-05-02 21:31     ` Felipe Contreras
2023-05-03 21:42     ` Taylor Blau
2023-05-03 21:58       ` Junio C Hamano
2023-04-25 18:53   ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30   ` Derrick Stolee
2023-04-25 18:57     ` Taylor Blau
2023-04-25 19:52       ` Derrick Stolee
2023-05-03 21:43         ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee [this message]
2023-04-25 19:01   ` Taylor Blau
2023-04-25 20:27     ` Derrick Stolee
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52   ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27   ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27   ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13     ` Derrick Stolee
2023-05-05 18:43       ` Taylor Blau
2023-05-05 17:33   ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59   ` Derrick Stolee
2023-05-05 18:46     ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45       ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38   ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38   ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38   ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53     ` Derrick Stolee
2023-05-08 22:12       ` Taylor Blau
2023-05-10 22:55   ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10     ` Taylor Blau
2023-05-11 15:23       ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25   ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25   ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25   ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f25c6234-83c0-fa49-85f5-9005e312b8a3@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).