From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, me@ttaylorr.com
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com
Subject: Re: [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition
Date: Mon, 7 Dec 2020 13:43:51 -0500 [thread overview]
Message-ID: <2f0540e6-b4f4-ea9f-bac0-ecf92c7b764d@gmail.com> (raw)
In-Reply-To: <20201207181909.3032039-1-jonathantanmy@google.com>
On 12/7/2020 1:19 PM, Jonathan Tan wrote:
>>>> In an effort to discover a happy medium, this change reduces the walk
>>>> for intermediate commits to only the first-parent history. This focuses
>>>> the walk on how the histories converge, which still has significant
>>>> reduction in repeat object walks. It is still possible to create
>>>> quadratic behavior in this version, but it is probably less likely in
>>>> realistic data shapes.
>>>
>>> Would this work? I agree that the width of the commit bitmasks would go
>>> down (and there would also be fewer commit bitmasks generated, further
>>> increasing the memory savings). But intuitively, if there is a commit
>>> that is selected and only accessible through non-1st-parent links, then
>>> any bitmaps generated for it cannot be contributed to its descendants
>>> (since there was no descendant-to-ancestor walk that could reach it in
>>> order to form the reverse edge).
>>
>> s/bitmaps/bitmasks.
>
> I do mean bitmaps there - bitmasks are contributed to parents, but
> bitmaps are contributed to descendants, if I remember correctly.
Ah, the confusion is related around the word "contributed".
Yes, without walking all the parents, we will not populate the
reverse edges with all of the possible connections. Thus, the
step that pushes reachability bitmap bits along the reverse edges
will not be as effective.
And this is the whole point: the reverse-edges existed to get us
into a state of _never_ walking an object multiple times, but that
ended up being too expensive to guarantee. This change relaxes that
condition in a way that still works for large, linear histories.
Since "pack-bitmap-write: fill bitmap with commit history" changed
fill_bitmap_commit() to walk commits until reaching those already in
the precomputed reachability bitmap, it will correctly walk far
enough to compute the reachability bitmap for that commit. It might
just walk objects that are part of _another_, already computed bitmap
that is not reachable via the first-parent history.
The very next patch "pack-bitmap-write: better reuse bitmaps" fixes
this problem by checking for computed bitmaps during the walk in
fill_bitmap_commit().
>> We'll select commits independent of their first
>> parent histories, and so in the situation that you're describing, if C
>> reaches A only through non-1st-parent history, then A's bitmask will not
>> contain the bits from C.
>
> C is the descendant and A is the ancestor. Yes, A's bitmask will not
> contain the bits from C.
>
>> But when generating the reachability bitmap for C, we'll still find that
>> we've generated a bitmap for A, and we can copy its bits directly.
>
> Here is my contention - this can happen only if there is a reverse edge
> from A to C, as far as I can tell, but such a reverse edge has not been
> formed.
See above. This patch is completely correct given the changes to
fill_bitmap_commit() from earlier. It just needs a tweak (in the
next patch) to recover some of the performance.
>> If
>> this differs from an ancestor P that _is_ in the first-parent history,
>> then P pushed its bits to C before calling fill_bitmap_commit() through
>> the reverse edges.
>>
>>>> Here is some data taken on a fresh clone of the kernel:
>>>>
>>>> | runtime (sec) | peak heap (GB) |
>>>> | | |
>>>> | from | with | from | with |
>>>> | scratch | existing | scratch | existing |
>>>> -----------+---------+----------+---------+-----------
>>>> original | 64.044 | 83.241 | 2.088 | 2.194 |
>>>> last patch | 44.811 | 27.828 | 2.289 | 2.358 |
>>>> this patch | 100.641 | 35.560 | 2.152 | 2.224 |
>>>
>>> Hmm...the jump from 44 to 100 seems rather large.
>>
>> Indeed. It's ameliorated a little bit in the later patches. We are
>> over-walking some objects (as in we are walking them multiple times),
>> but the return we get is reducing the peak heap usage from what it was
>> in the last patch.
>>
>> In the "unfathomably large" category, this makes things tractable.
>
> Quoting from the next patch [1]:
>
>> | runtime (sec) | peak heap (GB) |
>> | | |
>> | from | with | from | with |
>> | scratch | existing | scratch | existing |
>> -----------+---------+----------+---------+-----------
>> last patch | 100.641 | 35.560 | 2.152 | 2.224 |
>> this patch | 99.720 | 11.696 | 2.152 | 2.217 |
>
> That is true, but it is not ameliorated much :-(
>
> If you have steps to generate these timings, I would like to try
> comparing the performance between all patches and all-except-23.
>
> [1] https://lore.kernel.org/git/42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com/
The biggest problem is that all-except-23 is an unnacceptable
final state, since it has a performance blowout on super-wide
repos such as the git/git fork network. Perhaps Taylor could
include some performance numbers on that, but I'm pretty sure
that the calculation literally OOMs instead of completing. It
might be worth an explicit mention in the patch.
It might also be better to always include a baseline from the
start of the series to ensure that the final state is better
than the initial state. With only the last/this comparison,
it doesn't look great when we backtrack in performance (even
when it is necessary to do so).
Thanks,
-Stolee
next prev parent reply other threads:[~2020-12-07 18:46 UTC|newest]
Thread overview: 173+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 19:41 [PATCH 00/23] pack-bitmap: bitmap generation improvements Taylor Blau
2020-11-11 19:41 ` [PATCH 01/23] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-22 19:36 ` Junio C Hamano
2020-11-23 16:22 ` Taylor Blau
2020-11-24 2:48 ` Jeff King
2020-11-24 2:51 ` Jeff King
2020-12-01 22:56 ` Taylor Blau
2020-11-11 19:41 ` [PATCH 02/23] pack-bitmap: fix header size check Taylor Blau
2020-11-12 17:39 ` Martin Ågren
2020-11-11 19:42 ` [PATCH 03/23] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-12 17:47 ` Martin Ågren
2020-11-13 4:57 ` Jeff King
2020-11-13 5:26 ` Martin Ågren
2020-11-13 21:29 ` Taylor Blau
2020-11-13 21:39 ` Jeff King
2020-11-13 21:49 ` Taylor Blau
2020-11-13 22:11 ` Jeff King
2020-11-11 19:42 ` [PATCH 04/23] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-11 19:42 ` [PATCH 05/23] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-11 19:42 ` [PATCH 06/23] ewah: factor out bitmap growth Taylor Blau
2020-11-11 19:42 ` [PATCH 07/23] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-22 20:32 ` Junio C Hamano
2020-11-23 16:49 ` Taylor Blau
2020-11-24 3:00 ` Jeff King
2020-11-24 20:11 ` Junio C Hamano
2020-11-11 19:43 ` [PATCH 08/23] ewah: implement bitmap_or() Taylor Blau
2020-11-22 20:34 ` Junio C Hamano
2020-11-23 16:52 ` Taylor Blau
2020-11-11 19:43 ` [PATCH 09/23] ewah: add bitmap_dup() function Taylor Blau
2020-11-11 19:43 ` [PATCH 10/23] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-11 19:43 ` [PATCH 11/23] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-11 19:43 ` [PATCH 12/23] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-11 19:43 ` [PATCH 13/23] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-11 19:43 ` [PATCH 14/23] commit: implement commit_list_contains() Taylor Blau
2020-11-11 19:43 ` [PATCH 15/23] t5310: add branch-based checks Taylor Blau
2020-11-11 20:58 ` Derrick Stolee
2020-11-11 21:04 ` Junio C Hamano
2020-11-15 23:26 ` Johannes Schindelin
2020-11-11 19:43 ` [PATCH 16/23] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-11 19:43 ` [PATCH 17/23] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-13 22:23 ` SZEDER Gábor
2020-11-13 23:03 ` Jeff King
2020-11-14 6:23 ` Jeff King
2020-11-11 19:43 ` [PATCH 18/23] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-11-11 19:44 ` [PATCH 19/23] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-11-11 19:44 ` [PATCH 20/23] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-11-11 19:44 ` [PATCH 21/23] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-11-11 19:44 ` [PATCH 22/23] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-11-11 19:44 ` [PATCH 23/23] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-11-17 21:46 ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements Taylor Blau
2020-11-17 21:46 ` [PATCH v2 01/24] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-17 21:46 ` [PATCH v2 02/24] pack-bitmap: fix header size check Taylor Blau
2020-11-17 21:46 ` [PATCH v2 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-17 21:46 ` [PATCH v2 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-17 21:46 ` [PATCH v2 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-17 21:46 ` [PATCH v2 06/24] ewah: factor out bitmap growth Taylor Blau
2020-11-17 21:47 ` [PATCH v2 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-17 21:47 ` [PATCH v2 08/24] ewah: implement bitmap_or() Taylor Blau
2020-11-17 21:47 ` [PATCH v2 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-11-17 21:47 ` [PATCH v2 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-25 0:53 ` Jonathan Tan
2020-11-28 17:27 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-25 1:00 ` Jonathan Tan
2020-11-17 21:47 ` [PATCH v2 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-22 21:50 ` Junio C Hamano
2020-11-23 14:54 ` Derrick Stolee
2020-11-25 1:14 ` Jonathan Tan
2020-11-28 17:21 ` Taylor Blau
2020-11-30 18:33 ` Jonathan Tan
2020-11-17 21:47 ` [PATCH v2 13/24] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-22 22:01 ` Junio C Hamano
2020-11-23 20:19 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 14/24] commit: implement commit_list_contains() Taylor Blau
2020-11-17 21:47 ` [PATCH v2 15/24] t5310: add branch-based checks Taylor Blau
2020-11-25 1:17 ` Jonathan Tan
2020-11-28 17:30 ` Taylor Blau
2020-11-17 21:47 ` [PATCH v2 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-17 21:47 ` [PATCH v2 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-11-17 21:48 ` [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-24 6:07 ` Jonathan Tan
2020-11-25 1:46 ` Jonathan Tan
2020-11-30 18:41 ` Derrick Stolee
2020-11-17 21:48 ` [PATCH v2 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-02 7:13 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-02 7:17 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-02 7:20 ` Jonathan Tan
2020-11-17 21:48 ` [PATCH v2 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-02 7:28 ` Jonathan Tan
2020-12-02 16:21 ` Taylor Blau
2020-11-17 21:48 ` [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-02 7:44 ` Jonathan Tan
2020-12-02 16:30 ` Taylor Blau
2020-12-07 18:19 ` Jonathan Tan
2020-12-07 18:43 ` Derrick Stolee [this message]
2020-12-07 18:45 ` Derrick Stolee
2020-12-07 18:48 ` Jeff King
2020-11-17 21:48 ` [PATCH v2 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-02 8:08 ` Jonathan Tan
2020-12-02 16:35 ` Taylor Blau
2020-12-02 18:22 ` Derrick Stolee
2020-12-02 18:25 ` Taylor Blau
2020-12-07 18:26 ` Jonathan Tan
2020-12-07 18:24 ` Jonathan Tan
2020-12-07 19:20 ` Derrick Stolee
2020-11-18 18:32 ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements SZEDER Gábor
2020-11-18 19:51 ` Taylor Blau
2020-11-22 2:17 ` Taylor Blau
2020-11-22 2:28 ` Taylor Blau
2020-11-20 6:34 ` Martin Ågren
2020-11-21 19:37 ` Junio C Hamano
2020-11-21 20:11 ` Martin Ågren
2020-11-22 2:31 ` Taylor Blau
2020-11-24 2:43 ` Jeff King
2020-12-01 23:04 ` Taylor Blau
2020-12-01 23:37 ` Jonathan Tan
2020-12-01 23:43 ` Taylor Blau
2020-12-02 8:11 ` Jonathan Tan
2020-12-08 0:04 ` [PATCH v3 " Taylor Blau
2020-12-08 0:04 ` [PATCH v3 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08 0:04 ` [PATCH v3 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08 0:04 ` [PATCH v3 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08 0:04 ` [PATCH v3 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08 0:04 ` [PATCH v3 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08 0:04 ` [PATCH v3 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08 0:04 ` [PATCH v3 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08 0:04 ` [PATCH v3 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08 0:04 ` [PATCH v3 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08 0:04 ` [PATCH v3 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08 0:05 ` [PATCH v3 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08 0:05 ` [PATCH v3 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08 0:05 ` [PATCH v3 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08 0:05 ` [PATCH v3 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08 0:05 ` [PATCH v3 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08 0:05 ` [PATCH v3 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08 0:05 ` [PATCH v3 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08 0:05 ` [PATCH v3 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08 0:05 ` [PATCH v3 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08 0:05 ` [PATCH v3 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08 0:05 ` [PATCH v3 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-08 0:05 ` [PATCH v3 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-08 20:56 ` [PATCH v3 00/24] pack-bitmap: bitmap generation improvements Junio C Hamano
2020-12-08 21:03 ` Taylor Blau
2020-12-08 22:03 ` Junio C Hamano
2020-12-08 22:03 ` [PATCH v4 " Taylor Blau
2020-12-08 22:03 ` [PATCH v4 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08 22:03 ` [PATCH v4 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08 22:03 ` [PATCH v4 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08 22:03 ` [PATCH v4 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08 22:03 ` [PATCH v4 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08 22:03 ` [PATCH v4 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08 22:03 ` [PATCH v4 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08 22:03 ` [PATCH v4 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08 22:03 ` [PATCH v4 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08 22:03 ` [PATCH v4 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08 22:03 ` [PATCH v4 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08 22:04 ` [PATCH v4 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08 22:04 ` [PATCH v4 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08 22:04 ` [PATCH v4 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08 22:04 ` [PATCH v4 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08 22:04 ` [PATCH v4 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08 22:04 ` [PATCH v4 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08 22:04 ` [PATCH v4 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08 22:04 ` [PATCH v4 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08 22:04 ` [PATCH v4 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08 22:05 ` [PATCH v4 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08 22:05 ` [PATCH v4 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08 22:05 ` [PATCH v4 23/24] pack-bitmap-write: relax unique revwalk condition Taylor Blau
2020-12-08 22:05 ` [PATCH v4 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f0540e6-b4f4-ea9f-bac0-ecf92c7b764d@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=martin.agren@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).