git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal
Date: Tue, 25 Apr 2023 16:27:21 -0400	[thread overview]
Message-ID: <aa162f0f-022d-7216-51c9-c53884362628@github.com> (raw)
In-Reply-To: <ZEgjiAtK8O0+dbht@nand.local>

On 4/25/2023 3:01 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 02:06:25PM -0400, Derrick Stolee wrote:
>> For my curiosity, and since you already have a test environment set up,
>> could you redo the "without bitmaps" case with pack.useSparse true and
>> false? When the option was created and defaulted to true, we never
>> really explored comparing it to the bitmap case. In fact, I assumed the
>> bitmap case was faster in important cases like this (where there is a
>> substantial difference in object counts), but your data is surprising me
>> that the sparse algorithm is outperforming bitmaps, even with this new
>> algorithm.
>>
>> The main question I'd like to ask is: is pack.useSparse contributing
>> in an important way to the performance here?
> 
> I don't know enough about pack.useSparse to say with certainty, but
> trying again on the same repository (which is reasonably well-packed at
> the moment), they appear about the same:

Thanks for running that. I apologize, but I didn't sync in my head
that you're purposefully using 'git rev-list' to do the walk and the
config only makes a difference for 'git pack-objects'. The suspiciously
close timings is indeed too suspicious. Thus, the performance numbers
you are considering are not identical to what would happen in 'git
pack-objects' by default.

So, I created my own example using git.git, which included this input:

	refs/remotes/origin/master
	refs/remotes/origin/maint
	refs/remotes/origin/next
	refs/remotes/origin/seen
	^refs/remotes/origin/master~1000

And there is a clear preference for pack.useSparse=false in this
case:

Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=true -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):      2.857 s ±  0.031 s    [User: 6.850 s, System: 0.378 s]
  Range (min … max):    2.814 s …  2.922 s    10 runs
 
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):      2.682 s ±  0.029 s    [User: 6.678 s, System: 0.388 s]
  Range (min … max):    2.653 s …  2.753 s    10 runs
 
and I used the perf traces to extract that the enumerate objects
phase took ~550ms for the sparse walk and ~375ms for the non-sparse
version.

But using this same input, I'm able to construct an example where
your modified bitmap walk is faster than the non-sparse object walk:

Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=true pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):      2.045 s ±  0.039 s    [User: 6.464 s, System: 0.334 s]
  Range (min … max):    1.966 s …  2.089 s    10 runs
 
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):      2.138 s ±  0.034 s    [User: 6.070 s, System: 0.333 s]
  Range (min … max):    2.058 s …  2.182 s    10 runs

(Here, the enumerate objects phase takes ~230ms on average.)

I can easily swap this by changing my negative ref to

  ^revs/remotes/origin/master

Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=true pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):     521.0 ms ±  28.8 ms    [User: 1190.9 ms, System: 103.6 ms]
  Range (min … max):   496.5 ms … 589.8 ms    10 runs
 
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
  Time (mean ± σ):     498.5 ms ±  13.2 ms    [User: 1089.1 ms, System: 123.3 ms]
  Range (min … max):   471.3 ms … 513.6 ms    10 runs

which gives us reason to think about "small fetches" don't need
bitmaps, but "big fetches" do. But that can be done separately
from this patch.

Context about the sparse walk:

The sparse walk uses the names of the paths with respect to the root 
trees as a way to walk only the "new" objects without walking the
entire object set at the boundary. This makes a big difference in
repos with many files at HEAD, but the risk is that those path names
become significant overhead if there are enough new changes spread
across a large fraction of the tree. For clients pushing the work
of a single developer, the sparse algorithm outperforms the default
walk.

I wrote about this years ago in [1] and almost brought it up earlier
as a description of the commit boundary (called the frontier in [1]),
but the sparse object walk is helpful to visualize.

[1] https://devblogs.microsoft.com/devops/exploring-new-frontiers-for-git-push-performance/

Thanks,
-Stolee

  reply	other threads:[~2023-04-25 20:27 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25  0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25  0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15   ` Derrick Stolee
2023-05-03 21:48     ` Taylor Blau
2023-05-04 13:46       ` Derrick Stolee
2023-05-03 22:08     ` Taylor Blau
2023-05-04 13:59       ` Derrick Stolee
2023-05-05 17:30         ` Taylor Blau
2023-04-25 18:22   ` Junio C Hamano
2023-04-25 18:48     ` Taylor Blau
2023-04-25  0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32   ` Junio C Hamano
2023-04-25 18:51     ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25  0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52   ` Junio C Hamano
2023-05-02 21:31     ` Felipe Contreras
2023-05-03 21:42     ` Taylor Blau
2023-05-03 21:58       ` Junio C Hamano
2023-04-25 18:53   ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30   ` Derrick Stolee
2023-04-25 18:57     ` Taylor Blau
2023-04-25 19:52       ` Derrick Stolee
2023-05-03 21:43         ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01   ` Taylor Blau
2023-04-25 20:27     ` Derrick Stolee [this message]
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52   ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27   ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27   ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13     ` Derrick Stolee
2023-05-05 18:43       ` Taylor Blau
2023-05-05 17:33   ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59   ` Derrick Stolee
2023-05-05 18:46     ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45       ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38   ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38   ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38   ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53     ` Derrick Stolee
2023-05-08 22:12       ` Taylor Blau
2023-05-10 22:55   ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10     ` Taylor Blau
2023-05-11 15:23       ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25   ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25   ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25   ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa162f0f-022d-7216-51c9-c53884362628@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).