From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>,
git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
Date: Tue, 25 Aug 2020 11:42:24 -0400 [thread overview]
Message-ID: <20200825154224.GA9116@syl.lan> (raw)
In-Reply-To: <6a34d7ee-8c6b-8c6c-93bd-0013dccccafb@gmail.com>
On Tue, Aug 25, 2020 at 11:14:41AM -0400, Derrick Stolee wrote:
> On 8/25/2020 10:41 AM, Taylor Blau wrote:
> > On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote:
> >> The code in builtin/repack.c looks good for sure. I have a quick question
> >> about this new test:
> >>
> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> + git multi-pack-index write &&
> >> + cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> + test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >> +
> >> + # Write a new pack that is unknown to the multi-pack-index.
> >> + git hash-object -w </dev/null >blob &&
> >> + git pack-objects $objdir/pack/pack <blob &&
> >> +
> >> + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> + test_cmp_bin $objdir/pack/multi-pack-index \
> >> + $objdir/pack/multi-pack-index.bak
> >> +'
> >> +
> >>
> >> You create an arbitrary blob, and then add it to a pack-file. Do we
> >> know that 'git repack' is definitely creating a new pack-file that makes
> >> our manually-created pack-file redundant?
> >>
> >> My suggestion is to have the test check itself:
> >>
> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> + git multi-pack-index write &&
> >> + cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> + test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >> +
> >> + # Write a new pack that is unknown to the multi-pack-index.
> >> + git hash-object -w </dev/null >blob &&
> >> + HASH=$(git pack-objects $objdir/pack/pack <blob) &&
> >> +
> >> + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> + test_cmp_bin $objdir/pack/multi-pack-index \
> >> + $objdir/pack/multi-pack-index.bak &&
> >> + test_path_is_missing $objdir/pack/pack-$HASH.pack
> >> +'
> >> +
> >>
> >> This test fails for me, on the 'test_path_is_missing'. Likely, the
> >> blob is seen as already in a pack-file so is just pruned by 'git repack'
> >> instead. I thought that perhaps we need to add a new pack ourselves that
> >> overrides the small pack. Here is my attempt:
> >>
> >> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> git multi-pack-index write &&
> >> cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >>
> >> # Write a new pack that is unknown to the multi-pack-index.
> >> BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
> >> BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
> >> cat >blobs <<-EOF &&
> >> $BLOB1
> >> $BLOB2
> >> EOF
> >> HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
> >> HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
> >> GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> test_cmp_bin $objdir/pack/multi-pack-index \
> >> $objdir/pack/multi-pack-index.bak &&
> >> test_path_is_file $objdir/pack/pack-$HASH2.pack &&
> >> test_path_is_missing $objdir/pack/pack-$HASH1.pack
> >> '
> >>
> >> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
> >> how to make sure your logic is tested. I saw that 'git repack' was writing
> >> "nothing new to pack" in the output, so I also tested adding a few commits and
> >> trying to force it to repack reachable data, but I cannot seem to trigger it
> >> to create a new pack that overrides only one pack that is not in the MIDX.
> >>
> >> Likely, I just don't know how 'git rebase' works well enough to trigger this
> >> behavior. But the test as-is is not testing what you want it to test.
> >
> > I think this case might actually be impossible to tickle in a test. I
> > thought that 'git repack -d' looked for existing packs whose objects are
> > a subset of some new pack generated. But, it's much simpler than that:
> > '-d' by itself just looks for packs that were already on disk with the
> > same SHA-1 as a new pack, and it removes the old one.
>
> If 'git repack' never calls remove_redundant_pack() unless we are doing
> a "full repack", then we could simplify this logic:
>
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> - strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> + struct multi_pack_index *m = get_multi_pack_index(the_repository);
> + strbuf_addf(&buf, "%s.pack", base_name);
> + if (m && midx_contains_pack(m, buf.buf))
> + clear_midx_file(the_repository);
> + strbuf_insertf(&buf, 0, "%s/", dir_name);
> unlink_pack_path(buf.buf, 1);
> strbuf_release(&buf);
> }
>
> to
>
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> + clear_midx_file(the_repository);
> unlink_pack_path(buf.buf, 1);
> strbuf_release(&buf);
> }
>
> and get the same results as we are showing in these tests. This does
> move us incrementally to a better situation: don't delete the MIDX
> if we aren't deleting pack files. But, I think we can get around it.
Makes sense, but reading your whole email we are better off leaving this
as-is and changing the test to exercise it more often.
> > Note that 'git repack' uses 'git pack-objects' internally to find
> > objects and generate a packfile. When calling 'git pack-objects', 'git
> > repack -d' passes '--all' and '--unpacked', which means that there is no
> > way we'd generate a new pack with the same SHA-1 as an existing pack
> > ordinarily.
> >
> > So, I think this case is impossible, or at least astronomically
> > unlikely. What is more interesting (and untested) is that adding a _new_
> > pack doesn't cause us to invalidate the MIDX. Here's a patch that does
> > that:
> >
> > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> > index 16a1ad040e..620f2058d6 100755
> > --- a/t/t5319-multi-pack-index.sh
> > +++ b/t/t5319-multi-pack-index.sh
> > @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' '
> > test_path_is_missing $objdir/pack/multi-pack-index
> > '
> >
> > -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> > - git multi-pack-index write &&
> > - cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> > - test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> > -
> > - # Write a new pack that is unknown to the multi-pack-index.
> > - git hash-object -w </dev/null >blob &&
> > - git pack-objects $objdir/pack/pack <blob &&
> > -
> > - GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> > - test_cmp_bin $objdir/pack/multi-pack-index \
> > - $objdir/pack/multi-pack-index.bak
> > +test_expect_success 'repack preserves multi-pack-index when creating packs' '
> > + git init preserve &&
> > + test_when_finished "rm -fr preserve" &&
> > + (
> > + cd preserve &&
> > + midx=.git/objects/pack/multi-pack-index &&
> > +
> > + test_commit "initial" &&
> > + git repack -ad &&
> > + git multi-pack-index write &&
> > + ls .git/objects/pack | grep "\.pack$" >before &&
> > +
> > + cp $midx $midx.bak &&
> > +
> > + test_commit "another" &&
> > + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> > + ls .git/objects/pack | grep "\.pack$" >after &&
> > +
> > + test_cmp_bin $midx.bak $midx &&
> > + ! test_cmp before after
> > + )
> > '
>
> After looking at the callers to remote_redundant_pack() I noticed that it is only
> called after inspecting the "names" struct, which contains the names of the packs
> to group into a new pack-file. We can use a .keep file to preserve the pack-file(s) in
> the MIDX but also to ensure multiple pack-files outside of the MIDX are repacked and
> deleted. While this is very unlikely in the wild, it is definitely possible.
Great idea.
> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> git init preserve &&
> test_when_finished "rm -fr preserve" &&
> (
> cd preserve &&
> midx=.git/objects/pack/multi-pack-index &&
>
> test_commit 1 &&
> HASH1=$(git pack-objects --all .git/objects/pack/pack) &&
> touch .git/objects/pack/pack-$HASH1.keep &&
>
> cat >pack-input <<-\EOF &&
Escaping the heredoc shouldn't be necessary, so this can be written
instead as '<<-EOF'.
> HEAD
> ^HEAD~1
> EOF
>
> test_commit 2 &&
> HASH2=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
> touch .git/objects/pack/pack-$HASH2.keep &&
>
> git multi-pack-index write &&
> cp $midx $midx.bak &&
>
> test_commit 3 &&
> HASH3=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
>
> test_commit 4 &&
> HASH4=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
>
> GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
> test_path_is_file .git/objects/pack/pack-$HASH1.pack &&
> test_path_is_file .git/objects/pack/pack-$HASH2.pack &&
> test_path_is_missing .git/objects/pack/pack-$HASH3.pack &&
> test_path_is_missing .git/objects/pack/pack-$HASH4.pack
...and we should check that 'test_cmp $midx.bak $midx' is clean, i.e.,
that we didn't touch the MIDX.
> )
> '
>
> I believe this checks your condition properly enough.
Otherwise, I think that this test looks great. Thanks for suggesting
it. I'll send a new patch now...
> Thanks,
> -Stolee
Thanks,
Taylor
next prev parent reply other threads:[~2020-08-25 15:42 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-25 2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25 2:26 ` Jeff King
2020-08-25 2:37 ` Taylor Blau
2020-08-25 13:14 ` Derrick Stolee
2020-08-25 14:41 ` Taylor Blau
2020-08-25 15:14 ` Derrick Stolee
2020-08-25 15:42 ` Taylor Blau [this message]
2020-08-25 16:56 ` Jeff King
2020-08-25 15:58 ` Junio C Hamano
2020-08-25 16:08 ` Taylor Blau
2020-08-25 16:18 ` Derrick Stolee
2020-08-25 17:34 ` Jeff King
2020-08-25 17:22 ` Jeff King
2020-08-25 18:05 ` Junio C Hamano
2020-08-25 18:27 ` Jeff King
2020-08-25 22:45 ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
2020-08-25 23:09 ` Taylor Blau
2020-08-25 23:22 ` Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
2020-08-26 1:24 ` Eric Sunshine
2020-08-26 7:55 ` Johannes Schindelin
2020-08-26 16:27 ` Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 1:28 ` Eric Sunshine
2020-08-26 1:42 ` Junio C Hamano
2020-08-26 16:08 ` Junio C Hamano
2020-08-26 16:28 ` Junio C Hamano
2020-08-26 8:02 ` Johannes Schindelin
2020-08-26 1:17 ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
2020-08-26 1:19 ` Junio C Hamano
2020-08-26 8:06 ` Johannes Schindelin
2020-08-26 16:30 ` Junio C Hamano
2020-08-28 2:13 ` Johannes Schindelin
2020-08-28 22:03 ` Junio C Hamano
2020-08-31 9:59 ` Johannes Schindelin
2020-08-31 17:45 ` Junio C Hamano
2020-12-20 15:25 ` Johannes Schindelin
2020-12-21 22:24 ` Junio C Hamano
2020-12-30 5:30 ` Johannes Schindelin
2020-08-26 8:09 ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
2020-08-26 16:45 ` Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 21:37 ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
2020-08-26 22:25 ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
2020-08-27 4:21 ` Jeff King
2020-08-27 4:30 ` Junio C Hamano
2020-08-27 4:31 ` Eric Sunshine
2020-08-27 4:44 ` Jeff King
2020-08-27 5:03 ` Eric Sunshine
2020-08-27 5:25 ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
2020-08-27 5:56 ` Eric Sunshine
2020-08-27 15:31 ` Junio C Hamano
2020-08-27 4:13 ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
2020-08-27 4:22 ` Jeff King
2020-08-27 4:31 ` Junio C Hamano
2020-08-27 4:14 ` Jeff King
2020-08-27 15:34 ` Junio C Hamano
2020-08-31 22:56 ` Junio C Hamano
2020-09-01 4:49 ` Jeff King
2020-09-01 16:11 ` Junio C Hamano
2020-08-27 0:57 ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
2020-08-27 1:22 ` Junio C Hamano
2020-08-28 9:14 ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
2020-08-28 22:45 ` Junio C Hamano
2020-08-25 7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
2020-08-25 12:45 ` Derrick Stolee
2020-08-25 14:45 ` Taylor Blau
2020-08-25 16:04 ` [PATCH v2] " Taylor Blau
2020-08-26 20:51 ` Derrick Stolee
2020-08-26 20:54 ` Junio C Hamano
2020-08-25 16:47 ` [PATCH] " Jeff King
2020-08-25 17:10 ` Derrick Stolee
2020-08-25 17:29 ` Jeff King
2020-08-25 17:34 ` Taylor Blau
2020-08-25 17:42 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200825154224.GA9116@syl.lan \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).