From: Victoria Dye <vdye@github.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: gitster@pobox.com
Subject: Re: [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit
Date: Fri, 20 May 2022 12:42:58 -0700 [thread overview]
Message-ID: <a060d672-df82-95a3-072a-7ab7b0566556@github.com> (raw)
In-Reply-To: <08da02fa74c211ae1019cb0a9f4e30cc239e1ab9.1653073280.git.me@ttaylorr.com>
Taylor Blau wrote:
> When doing a `--geometric=<d>` repack, `git repack` determines a
> splitting point among packs ordered by their object count such that:
>
> - each pack above the split has at least `<d>` times as many objects
> as the next-largest pack by object count, and
> - the first pack above the split has at least `<d>` times as many
> object as the sum of all packs below the split line combined
>
> `git repack` then creates a pack containing all of the objects contained
> in packs below the split line by running `git pack-objects
> --stdin-packs` underneath. Once packs are moved into place, then any
> packs below the split line are removed, since their objects were just
> combined into a new pack.
>
> But `git repack` tries to be careful to avoid removing a pack that it
> just wrote, by checking:
>
> struct packed_git *p = geometry->pack[i];
> if (string_list_has_string(&names, hash_to_hex(p->hash)))
> continue;
>
> in the `delete_redundant` and `geometric` conditional towards the end of
> `cmd_repack`.
>
> But it's possible to trick `git repack` into not recognizing a pack that
> it just wrote when `names` is out-of-order (which violates
> `string_list_has_string()`'s assumption that the list is sorted and thus
> binary search-able).
>
> When this happens in just the right circumstances, it is possible to
> remove a pack that we just wrote, leading to object corruption.
>
> Luckily, this is quite difficult to provoke in practice (for a couple of
> reasons):
>
> - we ordinarily write just one pack, so `names` usually contains just
> one entry, and is thus sorted
> - when we do write more than one pack (e.g., due to `--max-pack-size`)
> we have to: (a) write a pack identical to one that already
> exists, (b) have that pack be below the split line, and (c) have
> the set of packs written by `pack-objects` occur in an order which
> tricks `string_list_has_string()`.
>
> Demonstrate the above scenario in a failing test, which causes `git
> repack --geometric` to write a pack which occurs below the split line,
> _and_ fail to recognize that it wrote that pack.
>
> The following patch will fix this bug.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 91bb2b37a8..2cd1de7295 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly'
> GIT_TEST_MULTI_PACK_INDEX=0
>
> objdir=.git/objects
> +packdir=$objdir/pack
> midx=$objdir/pack/multi-pack-index
>
> test_expect_success '--geometric with no packs' '
> @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
> )
> '
>
> +test_expect_failure '--geometric with pack.packSizeLimit' '
> + git init pack-rewrite &&
> + test_when_finished "rm -fr pack-rewrite" &&
> + (
> + cd pack-rewrite &&
> +
> + test-tool genrandom foo 1048576 >foo &&
> + test-tool genrandom bar 1048576 >bar &&
> +
I was a bit worried about this test being flaky in the future (relying on
particular pseudorandomly-generated file contents and the subsequent
ordering of hashes on the packs). But, since neither 'genrandom' nor the
pack hash generation seem likely to change (and I can't come up with an
alternative to this approach anyway), the test looks good as-is.
> + git add foo bar &&
> + test_tick &&
> + git commit -m base &&
> +
> + git rev-parse HEAD:foo HEAD:bar >p1.objects &&
> + git rev-parse HEAD HEAD^{tree} >p2.objects &&
> +
> + # These two packs each contain two objects, so the following
> + # `--geometric` repack will try to combine them.
> + p1="$(git pack-objects $packdir/pack <p1.objects)" &&
> + p2="$(git pack-objects $packdir/pack <p2.objects)" &&
> +
> + # Remove any loose objects in packs, since we do not want extra
> + # copies around (which would mask over potential object
> + # corruption issues).
> + git prune-packed &&
> +
> + # Both p1 and p2 will be rolled up, but pack-objects will write
> + # three packs:
> + #
> + # - one containing object "foo",
> + # - another containing object "bar",
> + # - a final pack containing the commit and tree objects
> + # (identical to p2 above)
> + git repack --geometric 2 -d --max-pack-size=1048576 &&
> +
> + # Ensure `repack` can detect that the third pack it wrote
> + # (containing just the tree and commit objects) was identical to
> + # one that was below the geometric split, so that we can save it
> + # from deletion.
> + #
> + # If `repack` fails to do that, we will incorrectly delete p2,
> + # causing object corruption.
> + git fsck
> + )
> +'
> +
> test_done
next prev parent reply other threads:[~2022-05-20 19:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
2022-05-20 19:01 ` [PATCH 1/3] repack: respect --keep-pack with geometric repack Taylor Blau
2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
2022-05-20 19:42 ` Victoria Dye [this message]
2022-05-20 23:22 ` Taylor Blau
2022-05-20 20:54 ` Junio C Hamano
2022-05-20 19:01 ` [PATCH 3/3] builtin/repack.c: ensure that `names` is sorted Taylor Blau
2022-05-20 19:46 ` [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Victoria Dye
2022-05-20 20:05 ` Derrick Stolee
2022-05-20 20:55 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a060d672-df82-95a3-072a-7ab7b0566556@github.com \
--to=vdye@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).