git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, jonathantanmy@google.com, stolee@gmail.com
Subject: [PATCH v4 0/9] midx: prevent bitmap corruption when permuting pack order
Date: Tue, 25 Jan 2022 17:40:57 -0500	[thread overview]
Message-ID: <cover.1643150456.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1638991570.git.me@ttaylorr.com>

Here is a(n even) small(er) reroll of my series which fixes a serious problem
with MIDX bitmaps by which they can become corrupt when permuting their pack
order.

The only change is to revert back to using finalize_object_file(), since it
behaves consistently with other parts of the code that touch $GIT_DIR/objects.
This is safe to do given the other change described in that patch. The
description has been updated slightly to reflect.

A minor tweak to the tests is done towards the end of the series. But otherwise
this is unchanged from v3.

It is prepared on the tip of master (which is 89bece5c8c at the time of
writing).

Taylor Blau (9):
  t5326: demonstrate bitmap corruption after permutation
  midx.c: make changing the preferred pack safe
  pack-revindex.c: instrument loading on-disk reverse index
  t5326: drop unnecessary setup
  t5326: extract `test_rev_exists`
  t5326: move tests to t/lib-bitmap.sh
  t/lib-bitmap.sh: parameterize tests over reverse index source
  midx: read `RIDX` chunk when present
  pack-bitmap.c: gracefully fallback after opening pack/MIDX

 Documentation/technical/multi-pack-index.txt |   1 +
 Documentation/technical/pack-format.txt      |  13 +-
 midx.c                                       |  29 ++-
 midx.h                                       |   1 +
 pack-bitmap.c                                |   4 +
 pack-revindex.c                              |  20 ++
 t/lib-bitmap.sh                              | 185 +++++++++++++++++++
 t/t5310-pack-bitmaps.sh                      |  28 +++
 t/t5326-multi-pack-bitmaps.sh                | 164 +++-------------
 t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
 t/t7700-repack.sh                            |   4 -
 11 files changed, 321 insertions(+), 151 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

Range-diff against v3:
 1:  babce7d29a =  1:  7ea9cced8e t5326: demonstrate bitmap corruption after permutation
 2:  7d20c13f8b !  2:  62d7561482 midx.c: make changing the preferred pack safe
    @@ Commit message
             that means that a modified .rev file will not be moved into place if
             MIDX's checksum was unchanged.
     
    -    The fix here is two-fold. First, we need to stop linking the file into
    -    place and instead rename it. It's likely we were using
    -    finalize_object_file() instead of a pure rename() because the former
    -    also adjusts shared permissions. But that is unnecessary, because we
    -    already do so in write_rev_file_order(), so rename alone is safe.
    +    This fix is to force the MIDX's checksum to change when the preferred
    +    pack changes but the set of packs contained in the MIDX does not. In
    +    other words, when the object order changes, the MIDX's checksum needs to
    +    change with it (regardless of whether the MIDX is tracking the same or
    +    different packs).
     
    -    But we also need to make the MIDX's checksum change in some way when the
    -    preferred pack changes without altering the set of packs stored in a
    -    MIDX to prevent a race where the new .rev file is moved into place
    -    before the MIDX is updated. Here, you'd get the opposite effect: reading
    -    old bitmaps with the new object order.
    +    This prevents a race whereby changing the object order (but not the
    +    packs themselves) enables a reader to see the new .rev file with the old
    +    MIDX, or similarly seeing the new bitmap with the old object order.
     
    -    But this race bites us even here: suppose that we didn't change the MIDX
    -    checksum, but only renamed the auxiliary object order into place instead
    -    of hardlinking it. Then when we go to generate the new bitmap, we'll
    -    load the old MIDX bitmap, along with the MIDX that it references. That's
    -    fine, since the new MIDX isn't moved into place until after the new
    -    bitmap is generated. But the new object order *has* been moved into
    -    place. So we'll read the old bitmaps in the new order when generating
    -    the new bitmap file, meaning that without this secondary change, bitmap
    -    generation itself would become a victim of the race described here.
    +    But why can't we just stop hardlinking the .rev into place instead
    +    adding additional data to the MIDX? Suppose that's what we did. Then
    +    when we go to generate the new bitmap, we'll load the old MIDX bitmap,
    +    along with the MIDX that it references. That's fine, since the new MIDX
    +    isn't moved into place until after the new bitmap is generated. But the
    +    new object order *has* been moved into place. So we'll read the old
    +    bitmaps in the new order when generating the new bitmap file, meaning
    +    that without this secondary change, bitmap generation itself would
    +    become a victim of the race described here.
     
         This can all be prevented by forcing the MIDX's checksum to change when
    -    the object order changes. We could include the entire object order in
    -    the MIDX, but doing so is somewhat awkward. (For example, the code that
    -    writes a .rev file expects to know the checksum of the associated pack
    -    or MIDX, but writing that data into the MIDX itself makes that a
    -    circular dependency).
    +    the object order does. By embedding the entire object order into the
    +    MIDX, we do just that. That is, the MIDX's checksum will change in
    +    response to any perturbation of the underlying object order. In t5326,
    +    this will cause the MIDX's checksum to update (even without changing the
    +    set of packs in the MIDX), preventing the stale read problem.
     
    -    Instead, make the object order used during bitmap generation part of the
    -    MIDX itself. That means that the new test in t5326 will cause the MIDX's
    -    checksum to update, preventing the stale read problem.
    +    Note that this makes it safe to continue to link(2) the MIDX .rev file
    +    into place, since it is now impossible to have a .rev file that is
    +    out-of-sync with the MIDX whose checksum it references. (But we will do
    +    away with MIDX .rev files later in this series anyway, so this is
    +    somewhat of a moot point).
     
         In theory, it is possible to store a "fingerprint" of the full object
         order here, so long as that fingerprint changes at least as often as the
    @@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
      struct midx_pack_order_data {
      	uint32_t nr;
      	uint32_t pack;
    -@@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
    - 	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
    - 					midx_hash, WRITE_REV);
    - 
    --	if (finalize_object_file(tmp_file, buf.buf))
    -+	if (rename(tmp_file, buf.buf))
    - 		die(_("cannot store reverse index file"));
    - 
    - 	strbuf_release(&buf);
     @@ midx.c: static int write_midx_internal(const char *object_dir,
      			(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
      			write_midx_large_offsets);
 3:  3279e2eb9b =  3:  abc18613e0 pack-revindex.c: instrument loading on-disk reverse index
 4:  5818621ea8 =  4:  80589d7ae6 t5326: drop unnecessary setup
 5:  33502d6a17 =  5:  b9c4ff8636 t5326: extract `test_rev_exists`
 6:  76e23cae0f =  6:  00c6e914b4 t5326: move tests to t/lib-bitmap.sh
 7:  7ce3dc60f9 !  7:  3f35ef6499 t/lib-bitmap.sh: parameterize tests over reverse index source
    @@ t/lib-bitmap.sh: midx_pack_source () {
      	commit="$1"
     +	kind="$2"
      
    - 	test_expect_success 'reverse index exists' '
    +-	test_expect_success 'reverse index exists' '
    ++	test_expect_success "reverse index exists ($kind)" '
      		GIT_TRACE2_EVENT=$(pwd)/event.trace \
      			git rev-list --test-bitmap "$commit" &&
      
 8:  55aa69de12 =  8:  94563cf038 midx: read `RIDX` chunk when present
 9:  9707d5ea44 =  9:  581b723792 pack-bitmap.c: gracefully fallback after opening pack/MIDX
-- 
2.34.1.455.gd6eb6fd089

  parent reply	other threads:[~2022-01-25 22:41 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-08 19:26 ` [PATCH 2/2] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
2021-12-08 19:55   ` Jeff King
2021-12-10 18:36     ` Taylor Blau
2021-12-10 22:31       ` Taylor Blau
2021-12-11  1:39         ` Taylor Blau
2021-12-13 14:00           ` Derrick Stolee
2021-12-13 14:31             ` Taylor Blau
2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-14  1:55   ` [PATCH v2 2/8] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-14  1:55   ` [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2021-12-14  1:55   ` [PATCH v2 4/8] t5326: drop unnecessary setup Taylor Blau
2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
2021-12-20 18:33     ` Derrick Stolee
2022-01-04 15:33       ` Taylor Blau
2021-12-14  1:55   ` [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2021-12-14  1:55   ` [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
2021-12-20 18:42     ` Derrick Stolee
2022-01-04 15:21       ` Taylor Blau
2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
2021-12-15 21:37     ` Taylor Blau
2021-12-15 22:17       ` Junio C Hamano
2021-12-15 22:55         ` Junio C Hamano
2021-12-20 18:51     ` Derrick Stolee
2021-12-20 19:52       ` Taylor Blau
2021-12-20 20:09         ` Derrick Stolee
2021-12-15 22:58   ` Junio C Hamano
2021-12-15 23:01     ` Taylor Blau
2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2022-01-20 17:55     ` Jonathan Tan
2022-01-20 22:11       ` Taylor Blau
2022-01-20 22:41         ` Junio C Hamano
2022-01-20 22:46           ` Taylor Blau
2022-01-24 17:40         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
2022-01-14 21:35     ` Junio C Hamano
2022-01-14 21:43       ` Junio C Hamano
2022-01-15  0:59         ` Taylor Blau
2022-01-15  6:27           ` Junio C Hamano
2022-01-20 18:08     ` Jonathan Tan
2022-01-20 22:13       ` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2022-01-20 18:15     ` Jonathan Tan
2022-01-20 22:18       ` Taylor Blau
2022-01-24 17:53         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 4/9] t5326: drop unnecessary setup Taylor Blau
2022-01-04 18:15   ` [PATCH v3 5/9] t5326: extract `test_rev_exists` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2022-01-24 19:15     ` Jonathan Tan
2022-01-25 21:40       ` Taylor Blau
2022-01-26 21:00         ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
2022-01-24 19:27     ` Jonathan Tan
2022-01-25 21:45       ` Taylor Blau
2022-01-26 21:28         ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
2022-01-24 19:29     ` Jonathan Tan
2022-01-25 21:46       ` Taylor Blau
2022-01-25 22:40 ` Taylor Blau [this message]
2022-01-25 22:41   ` [PATCH v4 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2022-01-26 15:01     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:18       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 2/9] midx.c: make changing the preferred pack safe Taylor Blau
2022-01-25 22:41   ` [PATCH v4 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2022-01-26 15:03     ` Ævar Arnfjörð Bjarmason
2022-01-25 22:41   ` [PATCH v4 4/9] t5326: drop unnecessary setup Taylor Blau
2022-01-25 22:41   ` [PATCH v4 5/9] t5326: extract `test_rev_exists` Taylor Blau
2022-01-26 15:04     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:19       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2022-01-25 22:41   ` [PATCH v4 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2022-01-25 22:41   ` [PATCH v4 8/9] midx: read `RIDX` chunk when present Taylor Blau
2022-01-26 15:10     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:23       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
2022-01-26 15:08     ` Ævar Arnfjörð Bjarmason
2022-01-26 17:50   ` [PATCH v4 0/9] midx: prevent bitmap corruption when permuting pack order Ævar Arnfjörð Bjarmason
2022-01-26 20:24     ` Taylor Blau
2022-01-27 17:15       ` Jonathan Tan
2022-02-24 22:50         ` Taylor Blau
2022-01-27 14:13   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1643150456.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).