From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net
Subject: [PATCH 00/23] pack-bitmap: bitmap generation improvements
Date: Wed, 11 Nov 2020 14:41:34 -0500 [thread overview]
Message-ID: <cover.1605123652.git.me@ttaylorr.com> (raw)
This series contains some patches that GitHub has been using in its fork
for the past few months to improve generating reachability bitmaps,
particularly in pathological cases, such as the repo containing all
forks of chromium/chromium.
The patches that follow are organized into five parts:
- The first nine patches do some basic clean-up and fix a bug that we
were able to exercise in tests while writing these patches.
- The next two patches reimplements bitmap writing in order to avoid making
multiple passes over the object graph. This approach ends up
regressing both the time and memory used to generate bitmaps on the
kernel's fork-network, but ends up being a useful stepping stone for
further improvements.
- The six patches that follow that culminates in a patch to build
fewer intermediate bitmaps during the walk in order to reduce both
memory and time for reasonably-sized repositories. (Which
intermediate bitmaps are considered "important" is discussed in
detail in the seventeenth patch).
- The next several patches make reusing previously generated reachability
bitmaps purely an optimization for generating new bitmaps. Importantly, that
allows the bitmap selection process to pick better commits to bitmap going
forward, rather than blindly reusing previously selected ones. They also
include some light refactoring, and a patch to avoid tree walks when
existing bitmaps suffice.
- The final two patches address a trade-off in the prior patches between
walking a wide history only once with high memory cost, and walking the same
history multiple times with lower memory cost. Here, the walk is reduced to
only cover the first-parent history. The final patch treats existing bitmaps
as maximal in order to make it more difficult for a different set of
selected commits to "walk around" the previously selected commits and force
a large number of new bitmaps to be computed.
In the end, no block-buster performance improvements are attained on
normal-to-large sized repositories, but the new bitmap generation routine helps
substantially on enormous repositories, like the chromium/chromium fork-network.
Individual performance numbers are available in the patches throughout.
This series is a prerequisite to a list of other bitmap-related patches in
GitHub's fork, including multi-pack bitmaps.
Derrick Stolee (9):
pack-bitmap-write: fill bitmap with commit history
bitmap: add bitmap_diff_nonzero()
commit: implement commit_list_contains()
t5310: add branch-based checks
pack-bitmap-write: rename children to reverse_edges
pack-bitmap-write: build fewer intermediate bitmaps
pack-bitmap-write: use existing bitmaps
pack-bitmap-write: relax unique rewalk condition
pack-bitmap-write: better reuse bitmaps
Jeff King (11):
pack-bitmap: fix header size check
pack-bitmap: bounds-check size of cache extension
t5310: drop size of truncated ewah bitmap
rev-list: die when --test-bitmap detects a mismatch
ewah: factor out bitmap growth
ewah: make bitmap growth less aggressive
ewah: implement bitmap_or()
ewah: add bitmap_dup() function
pack-bitmap-write: reimplement bitmap writing
pack-bitmap-write: pass ownership of intermediate bitmaps
pack-bitmap-write: ignore BITMAP_FLAG_REUSE
Taylor Blau (3):
ewah/ewah_bitmap.c: grow buffer past 1
pack-bitmap: factor out 'bitmap_for_commit()'
pack-bitmap: factor out 'add_commit_to_bitmap()'
builtin/pack-objects.c | 1 -
commit.c | 11 +
commit.h | 2 +
ewah/bitmap.c | 54 ++++-
ewah/ewah_bitmap.c | 2 +-
ewah/ewok.h | 3 +-
pack-bitmap-write.c | 452 +++++++++++++++++++++++++---------------
pack-bitmap.c | 130 +++++-------
pack-bitmap.h | 8 +-
t/t5310-pack-bitmaps.sh | 164 ++++++++++++---
10 files changed, 548 insertions(+), 279 deletions(-)
--
2.29.2.156.gc03786897f
next reply other threads:[~2020-11-11 19:41 UTC|newest]
Thread overview: 173+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 19:41 Taylor Blau [this message]
2020-11-11 19:41 ` [PATCH 01/23] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-22 19:36 ` Junio C Hamano
2020-11-23 16:22 ` Taylor Blau
2020-11-24 2:48 ` Jeff King
2020-11-24 2:51 ` Jeff King
2020-12-01 22:56 ` Taylor Blau
2020-11-11 19:41 ` [PATCH 02/23] pack-bitmap: fix header size check Taylor Blau
2020-11-12 17:39 ` Martin Ågren
2020-11-11 19:42 ` [PATCH 03/23] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-12 17:47 ` Martin Ågren
2020-11-13 4:57 ` Jeff King
2020-11-13 5:26 ` Martin Ågren
2020-11-13 21:29 ` Taylor Blau
2020-11-13 21:39 ` Jeff King
2020-11-13 21:49 ` Taylor Blau
2020-11-13 22:11 ` Jeff King
2020-11-11 19:42 ` [PATCH 04/23] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-11 19:42 ` [PATCH 05/23] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-11 19:42 ` [PATCH 06/23] ewah: factor out bitmap growth Taylor Blau
2020-11-11 19:42 ` [PATCH 07/23] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-22 20:32 ` Junio C Hamano
2020-11-23 16:49 ` Taylor Blau
2020-11-24 3:00 ` Jeff King
2020-11-24 20:11 ` Junio C Hamano
2020-11-11 19:43 ` [PATCH 08/23] ewah: implement bitmap_or() Taylor Blau
2020-11-22 20:34 ` Junio C Hamano
2020-11-23 16:52 ` Taylor Blau
2020-11-11 19:43 ` [PATCH 09/23] ewah: add bitmap_dup() function Taylor Blau
2020-11-11 19:43 ` [PATCH 10/23] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-11 19:43 ` [PATCH 11/23] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-11 19:43 ` [PATCH 12/23] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-11 19:43 ` [PATCH 13/23] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-11 19:43 ` [PATCH 14/23] commit: implement commit_list_contains() Taylor Blau
2020-11-11 19:43 ` [PATCH 15/23] t5310: add branch-based checks Taylor Blau
2020-11-11 20:58 ` Derrick Stolee
2020-11-11 21:04 ` Junio C Hamano
2020-11-15 23:26 ` Johannes Schindelin
2020-11-11 19:43 ` [PATCH 16/23] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-11 19:43 ` [PATCH 17/23] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-13 22:23 ` SZEDER Gábor
2020-11-13 23:03 ` Jeff King
2020-11-14 6:23 ` Jeff King
2020-11-11 19:43 ` [PATCH 18/23] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-11-11 19:44 ` [PATCH 19/23] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-11-11 19:44 ` [PATCH 20/23] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-11-11 19:44 ` [PATCH 21/23] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-11-11 19:44 ` [PATCH 22/23] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-11-11 19:44 ` [PATCH 23/23] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-11-17 21:46 ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements Taylor Blau
2020-11-17 21:46 ` [PATCH v2 01/24] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-17 21:46 ` [PATCH v2 02/24] pack-bitmap: fix header size check Taylor Blau
2020-11-17 21:46 ` [PATCH v2 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-17 21:46 ` [PATCH v2 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-17 21:46 ` [PATCH v2 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-17 21:46 ` [PATCH v2 06/24] ewah: factor out bitmap growth Taylor Blau
2020-11-17 21:47 ` [PATCH v2 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-17 21:47 ` [PATCH v2 08/24] ewah: implement bitmap_or() Taylor Blau
2020-11-17 21:47 ` [PATCH v2 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-11-17 21:47 ` [PATCH v2 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-25 0:53 ` Jonathan Tan
2020-11-28 17:27 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-25 1:00 ` Jonathan Tan
2020-11-17 21:47 ` [PATCH v2 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-22 21:50 ` Junio C Hamano
2020-11-23 14:54 ` Derrick Stolee
2020-11-25 1:14 ` Jonathan Tan
2020-11-28 17:21 ` Taylor Blau
2020-11-30 18:33 ` Jonathan Tan
2020-11-17 21:47 ` [PATCH v2 13/24] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-22 22:01 ` Junio C Hamano
2020-11-23 20:19 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 14/24] commit: implement commit_list_contains() Taylor Blau
2020-11-17 21:47 ` [PATCH v2 15/24] t5310: add branch-based checks Taylor Blau
2020-11-25 1:17 ` Jonathan Tan
2020-11-28 17:30 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-17 21:47 ` [PATCH v2 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-11-17 21:48 ` [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-24 6:07 ` Jonathan Tan
2020-11-25 1:46 ` Jonathan Tan
2020-11-30 18:41 ` Derrick Stolee
2020-11-17 21:48 ` [PATCH v2 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-02 7:13 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-02 7:17 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-02 7:20 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-02 7:28 ` Jonathan Tan
2020-12-02 16:21 ` Taylor Blau
2020-11-17 21:48 ` [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-02 7:44 ` Jonathan Tan
2020-12-02 16:30 ` Taylor Blau
2020-12-07 18:19 ` Jonathan Tan
2020-12-07 18:43 ` Derrick Stolee
2020-12-07 18:45 ` Derrick Stolee
2020-12-07 18:48 ` Jeff King
2020-11-17 21:48 ` [PATCH v2 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-02 8:08 ` Jonathan Tan
2020-12-02 16:35 ` Taylor Blau
2020-12-02 18:22 ` Derrick Stolee
2020-12-02 18:25 ` Taylor Blau
2020-12-07 18:26 ` Jonathan Tan
2020-12-07 18:24 ` Jonathan Tan
2020-12-07 19:20 ` Derrick Stolee
2020-11-18 18:32 ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements SZEDER Gábor
2020-11-18 19:51 ` Taylor Blau
2020-11-22 2:17 ` Taylor Blau
2020-11-22 2:28 ` Taylor Blau
2020-11-20 6:34 ` Martin Ågren
2020-11-21 19:37 ` Junio C Hamano
2020-11-21 20:11 ` Martin Ågren
2020-11-22 2:31 ` Taylor Blau
2020-11-24 2:43 ` Jeff King
2020-12-01 23:04 ` Taylor Blau
2020-12-01 23:37 ` Jonathan Tan
2020-12-01 23:43 ` Taylor Blau
2020-12-02 8:11 ` Jonathan Tan
2020-12-08 0:04 ` [PATCH v3 " Taylor Blau
2020-12-08 0:04 ` [PATCH v3 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08 0:04 ` [PATCH v3 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08 0:04 ` [PATCH v3 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08 0:04 ` [PATCH v3 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08 0:04 ` [PATCH v3 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08 0:04 ` [PATCH v3 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08 0:04 ` [PATCH v3 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08 0:04 ` [PATCH v3 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08 0:04 ` [PATCH v3 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08 0:04 ` [PATCH v3 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08 0:05 ` [PATCH v3 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08 0:05 ` [PATCH v3 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08 0:05 ` [PATCH v3 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08 0:05 ` [PATCH v3 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08 0:05 ` [PATCH v3 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08 0:05 ` [PATCH v3 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08 0:05 ` [PATCH v3 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08 0:05 ` [PATCH v3 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08 0:05 ` [PATCH v3 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08 0:05 ` [PATCH v3 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-08 0:05 ` [PATCH v3 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-08 20:56 ` [PATCH v3 00/24] pack-bitmap: bitmap generation improvements Junio C Hamano
2020-12-08 21:03 ` Taylor Blau
2020-12-08 22:03 ` Junio C Hamano
2020-12-08 22:03 ` [PATCH v4 " Taylor Blau
2020-12-08 22:03 ` [PATCH v4 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08 22:03 ` [PATCH v4 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08 22:03 ` [PATCH v4 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08 22:03 ` [PATCH v4 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08 22:03 ` [PATCH v4 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08 22:03 ` [PATCH v4 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08 22:03 ` [PATCH v4 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08 22:03 ` [PATCH v4 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08 22:03 ` [PATCH v4 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08 22:03 ` [PATCH v4 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08 22:03 ` [PATCH v4 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08 22:04 ` [PATCH v4 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08 22:04 ` [PATCH v4 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08 22:04 ` [PATCH v4 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08 22:04 ` [PATCH v4 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08 22:04 ` [PATCH v4 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08 22:04 ` [PATCH v4 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08 22:04 ` [PATCH v4 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08 22:04 ` [PATCH v4 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08 22:04 ` [PATCH v4 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08 22:05 ` [PATCH v4 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08 22:05 ` [PATCH v4 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08 22:05 ` [PATCH v4 23/24] pack-bitmap-write: relax unique revwalk condition Taylor Blau
2020-12-08 22:05 ` [PATCH v4 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1605123652.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).