git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com
Subject: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
Date: Mon, 13 Dec 2021 20:55:22 -0500	[thread overview]
Message-ID: <cover.1639446906.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1638991570.git.me@ttaylorr.com>

Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
which they can become corrupt when permuting their pack order.

The problem is described in detail in the second patch. But this version goes
further and stops writing the MIDX .rev file altogether, instead reading the
same data out of a new optional 'RIDX' chunk (falling back to the .rev file when
necessary).

Applying just the first two patches is sufficient to fix this problem, but the
remaining six go further to deprecate the MIDX .rev file altogether. I'm not
crazy about the testing strategy, though I think it was the best that I came up
with.

The idea is to essentially copy the core of t5326 into a new t5327. The former
uses the RIDX chunk, and the latter uses .rev files. I would have like to just
run the same tests in t5326 twice, but they munge the state of their test
repository enough to make it sufficiently awkward to do in practice.

So I'm definitely open to suggestions there, but otherwise this series should go
a long ways towards fixing my design mistake of having the MIDX .rev file be
separate from the MIDX itself.

Taylor Blau (8):
  t5326: demonstrate bitmap corruption after permutation
  midx.c: make changing the preferred pack safe
  pack-revindex.c: instrument loading on-disk reverse index
  t5326: drop unnecessary setup
  t5326: extract `test_rev_exists`
  t5326: move tests to t/lib-bitmap.sh
  t/lib-bitmap.sh: parameterize tests over reverse index source
  midx: read `RIDX` chunk when present

 Documentation/technical/multi-pack-index.txt |   1 +
 Documentation/technical/pack-format.txt      |  13 +-
 midx.c                                       |  31 +++-
 midx.h                                       |   1 +
 pack-revindex.c                              |  20 ++
 t/lib-bitmap.sh                              | 186 +++++++++++++++++++
 t/t5326-multi-pack-bitmaps.sh                | 144 +-------------
 t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
 t/t7700-repack.sh                            |   4 -
 9 files changed, 271 insertions(+), 152 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

Range-diff against v1:
1:  ea91cebb6b = 1:  dfbac4bc60 t5326: demonstrate bitmap corruption after permutation
2:  fa42d816f4 ! 2:  4ea52e66dd midx.c: make changing the preferred pack safe
    @@ Commit message
         or MIDX, but writing that data into the MIDX itself makes that a
         circular dependency).
     
    -    Instead, make the pack order used during bitmap generation part of the
    +    Instead, make the object order used during bitmap generation part of the
         MIDX itself. That means that the new test in t5326 will cause the MIDX's
         checksum to update, preventing the stale read problem.
     
    -    There is no need to store the complete pack order here, since the only
    -    way a fixed set of packs can result in a change in the object ordering
    -    is by changing the preferred pack. That's because packs are sorted
    -    according to their pack name only, so it would be impossible to induce a
    -    different order by, say, touching a pack's mtime.
    -
    -    But storing the complete pack order gives us some flexibility in the
    -    future, and it only costs 4 bytes per pack to do. In the future, we
    -    could also optimize .rev reads looking for the identity of the preferred
    -    pack by consulting this list if it exists.
    +    In theory, it is possible to store a "fingerprint" of the full object
    +    order here, so long as that fingerprint changes at least as often as the
    +    full object order does. Some possibilities here include storing the
    +    identity of the preferred pack, along with the mtimes of the
    +    non-preferred packs in a consistent order. But storing a limited part of
    +    the information makes it difficult to reason about whether or not there
    +    are gaps between the two that would cause us to get bitten by this bug
    +    again.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ Documentation/technical/multi-pack-index.txt: and their offsets into multiple pa
        - An offset within the jth packfile for the object.
      - If large offsets are required, we use another list of large
        offsets similar to version 2 pack-indexes.
    -+- An optional list of pack IDs in bitmap-sorted order.
    ++- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
      
      Thus, we can provide O(log N) lookup time for any number
      of packfiles.
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
      	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
      	    8-byte offsets into large packfiles.
      
    -+	[Optional] Bitmap pack order (ID: {'P', 'O', 'R', 'D'})
    -+	    A list of 4-byte pack identifiers in sorted order according to
    -+	    their relative bitmap positions.
    ++	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
    ++	    A list of MIDX positions (one per object in the MIDX, num_objects in
    ++	    total, each a 4-byte unsigned integer in network byte order), sorted
    ++	    according to their relative bitmap/pseudo-pack positions.
     +
      TRAILER:
      
      	Index checksum of the above contents.
    +@@ Documentation/technical/pack-format.txt: In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
    + objects in packs stored by the MIDX, laid out in pack order, and the
    + packs arranged in MIDX order (with the preferred pack coming first).
    + 
    +-Finally, note that the MIDX's reverse index is not stored as a chunk in
    +-the multi-pack-index itself. This is done because the reverse index
    +-includes the checksum of the pack or MIDX to which it belongs, which
    +-makes it impossible to write in the MIDX. To avoid races when rewriting
    +-the MIDX, a MIDX reverse index includes the MIDX's checksum in its
    +-filename (e.g., `multi-pack-index-xyz.rev`).
    ++The MIDX's reverse index is stored in the optional 'RIDX' chunk within
    ++the MIDX itself.
     
      ## midx.c ##
     @@
      #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
      #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
      #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
    -+#define MIDX_CHUNKID_PACKORDER 0x504f5244 /* "PORD" */
    ++#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
      #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
      #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
      #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
    @@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
      	return 0;
      }
      
    -+static int write_midx_pack_order(struct hashfile *f,
    -+				 void *data)
    ++static int write_midx_revindex(struct hashfile *f,
    ++			       void *data)
     +{
     +	struct write_midx_context *ctx = data;
     +	uint32_t i;
    @@ midx.c: static int write_midx_internal(const char *object_dir,
      
     +	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
     +		ctx.pack_order = midx_pack_order(&ctx);
    -+		add_chunk(cf, MIDX_CHUNKID_PACKORDER,
    ++		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
     +			  ctx.entries_nr * sizeof(uint32_t),
    -+			  write_midx_pack_order);
    ++			  write_midx_revindex);
     +	}
     +
      	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
-:  ---------- > 3:  b630fea149 pack-revindex.c: instrument loading on-disk reverse index
-:  ---------- > 4:  f430b6f2e9 t5326: drop unnecessary setup
-:  ---------- > 5:  73faab9f42 t5326: extract `test_rev_exists`
-:  ---------- > 6:  bf42b116e1 t5326: move tests to t/lib-bitmap.sh
-:  ---------- > 7:  fa91631024 t/lib-bitmap.sh: parameterize tests over reverse index source
-:  ---------- > 8:  993bfa8dd8 midx: read `RIDX` chunk when present
-- 
2.34.1.25.gb3157a20e6

  parent reply	other threads:[~2021-12-14  1:55 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-08 19:26 ` [PATCH 2/2] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
2021-12-08 19:55   ` Jeff King
2021-12-10 18:36     ` Taylor Blau
2021-12-10 22:31       ` Taylor Blau
2021-12-11  1:39         ` Taylor Blau
2021-12-13 14:00           ` Derrick Stolee
2021-12-13 14:31             ` Taylor Blau
2021-12-14  1:55 ` Taylor Blau [this message]
2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-14  1:55   ` [PATCH v2 2/8] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-14  1:55   ` [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2021-12-14  1:55   ` [PATCH v2 4/8] t5326: drop unnecessary setup Taylor Blau
2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
2021-12-20 18:33     ` Derrick Stolee
2022-01-04 15:33       ` Taylor Blau
2021-12-14  1:55   ` [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2021-12-14  1:55   ` [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
2021-12-20 18:42     ` Derrick Stolee
2022-01-04 15:21       ` Taylor Blau
2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
2021-12-15 21:37     ` Taylor Blau
2021-12-15 22:17       ` Junio C Hamano
2021-12-15 22:55         ` Junio C Hamano
2021-12-20 18:51     ` Derrick Stolee
2021-12-20 19:52       ` Taylor Blau
2021-12-20 20:09         ` Derrick Stolee
2021-12-15 22:58   ` Junio C Hamano
2021-12-15 23:01     ` Taylor Blau
2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2022-01-20 17:55     ` Jonathan Tan
2022-01-20 22:11       ` Taylor Blau
2022-01-20 22:41         ` Junio C Hamano
2022-01-20 22:46           ` Taylor Blau
2022-01-24 17:40         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
2022-01-14 21:35     ` Junio C Hamano
2022-01-14 21:43       ` Junio C Hamano
2022-01-15  0:59         ` Taylor Blau
2022-01-15  6:27           ` Junio C Hamano
2022-01-20 18:08     ` Jonathan Tan
2022-01-20 22:13       ` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2022-01-20 18:15     ` Jonathan Tan
2022-01-20 22:18       ` Taylor Blau
2022-01-24 17:53         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 4/9] t5326: drop unnecessary setup Taylor Blau
2022-01-04 18:15   ` [PATCH v3 5/9] t5326: extract `test_rev_exists` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2022-01-24 19:15     ` Jonathan Tan
2022-01-25 21:40       ` Taylor Blau
2022-01-26 21:00         ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
2022-01-24 19:27     ` Jonathan Tan
2022-01-25 21:45       ` Taylor Blau
2022-01-26 21:28         ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
2022-01-24 19:29     ` Jonathan Tan
2022-01-25 21:46       ` Taylor Blau
2022-01-25 22:40 ` [PATCH v4 0/9] midx: prevent bitmap corruption when permuting pack order Taylor Blau
2022-01-25 22:41   ` [PATCH v4 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2022-01-26 15:01     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:18       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 2/9] midx.c: make changing the preferred pack safe Taylor Blau
2022-01-25 22:41   ` [PATCH v4 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2022-01-26 15:03     ` Ævar Arnfjörð Bjarmason
2022-01-25 22:41   ` [PATCH v4 4/9] t5326: drop unnecessary setup Taylor Blau
2022-01-25 22:41   ` [PATCH v4 5/9] t5326: extract `test_rev_exists` Taylor Blau
2022-01-26 15:04     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:19       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2022-01-25 22:41   ` [PATCH v4 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2022-01-25 22:41   ` [PATCH v4 8/9] midx: read `RIDX` chunk when present Taylor Blau
2022-01-26 15:10     ` Ævar Arnfjörð Bjarmason
2022-01-26 20:23       ` Taylor Blau
2022-01-25 22:41   ` [PATCH v4 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
2022-01-26 15:08     ` Ævar Arnfjörð Bjarmason
2022-01-26 17:50   ` [PATCH v4 0/9] midx: prevent bitmap corruption when permuting pack order Ævar Arnfjörð Bjarmason
2022-01-26 20:24     ` Taylor Blau
2022-01-27 17:15       ` Jonathan Tan
2022-02-24 22:50         ` Taylor Blau
2022-01-27 14:13   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1639446906.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).