git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips`
Date: Wed, 26 Apr 2023 06:52:26 -0400	[thread overview]
Message-ID: <97d84abc-9b25-9d43-503e-264b33c223d4@github.com> (raw)
In-Reply-To: <ZEhFScCoAgGGM/th@nand.local>

On 4/25/2023 5:25 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 03:42:51PM -0400, Derrick Stolee wrote:
>> On 4/20/2023 1:27 PM, Taylor Blau wrote:

>>> The implementation is straightforward: after determining the set of
>>> objects to pack into the cruft pack (either by calling
>>> `enumerate_cruft_objects()` or `enumerate_and_traverse_cruft_objects()`)
>>> we figure out if there are any program(s) we need to consult in order to
>>> determine if there are any objects which we need to "rescue". We then
>>> add those as tips to another reachability traversal, marking every
>>> object along the way as cruft (and thus retaining it in the cruft pack).
>>
>> I initially wondered why we would need a second walk, and I _think_
>> one reason is that we don't want to update the mtimes for these
>> objects as if they were reachable. The next point about the reachable
>> pack is also critical.
> 
> Their mtimes might be updated during that second walk, but the key point
> is that we want to rescue any objects that the extra tips might reach
> (but aren't listed themselves in the output of one of the hooks).
> 
> The idea there is similar to the rescuing pass we do for pruning GC's,
> where we'll save any objects which would have been pruned, but are
> reachable from another object which won't yet be pruned.
> 
>>> Instead, keep these unreachable objects in the cruft pack instead, to
>>> ensure that we can continue to have a pack containing just reachable
>>> objects, which is always safe to write a bitmap over.
>>
>> Makes sense. Also: don't update their mtime so they could be removed
>> immediately if the external pointers change.
> 
> I don't think I'm quite following why we would want to leave their
> mtimes alone here. I think we'd want their mtimes to be accurate as of
> the last time we updated the *.mtimes file so that if they (a) no longer
> appear in the output of one of the extra-cruft hooks, and (b) they have
> been written recently, that they would not be pruned immediately.

You corrected me that we _are_ updating mtimes, so my understanding was
incorrect and this follow-up is incorrect. Thanks.

Is it possible to demonstrate this mtime-updating in a test?

>>> +	if (progress)
>>> +		progress_state = start_progress(_("Enumerating extra cruft tips"), 0);
>>
>> Should we be including two progress counts? or would it be sufficient
>> to say "traversing extra cruft objects" and include both of these steps
>> in that one progress mode?
> 
> I may be missing something, but there are already two progress meters
> here: one (above) for enumerating the list of extra cruft tips, which
> increments each time we read a new line that refers to an extant object,
> and another (below) for the number of objects that we visit along the
> second walk.

I meant: why two counts? Should we instead only have one?

>>> +	if (prepare_revision_walk(&revs))
>>> +		die(_("revision walk setup failed"));
>>> +	if (progress)
>>> +		progress_state = start_progress(_("Traversing extra cruft objects"), 0);
>>> +	nr_seen = 0;
>>> +	traverse_commit_list(&revs, show_cruft_commit, show_cruft_object, NULL);
>>
>> Hm. I'm trying to look at these helpers and figure out what they
>> are doing to prevent these objects from being pruned. This could
>> use a comment or something, as I'm unable to grok it at present.
> 
> Very fair, the show_cruft_object() callback calls
> add_cruft_object_entry(), which adds the visited object to the cruft
> pack regardless of its mtime with respect to pruning.

Ah, thanks. That helps a lot.
 
> A comment here might look like:
> 
>     /* retain all objects reachable from extra tips via show_cruft_object() */
> 
> or something.

Sounds good.

>> Could we store this commit to make sure it is removed from the
>> repository later?
> 
> Possibly, though I think we already have good coverage of this in other
> tests. So I'd be equally happy to assert on the exact contents of the
> cruft pack (which wouldn't include the above commit), but I'm happy to
> assert on it more directly if you feel strongly.

This is a case of me thinking "can we test the test?" to make sure
the test is doing everything we think it is doing...

>>> +		git checkout --orphan old &&
>>> +		git rm -fr . &&
>>> +		test_commit --no-tag cruft.old &&
>>> +		cruft_old="$(git rev-parse HEAD)" &&
>>> +
>>> +		git checkout --orphan new &&
>>> +		git rm -fr . &&
>>> +		test_commit --no-tag cruft.new &&
>>> +		cruft_new="$(git rev-parse HEAD)" &&
>>> +
>>> +		git checkout main &&
>>> +		git branch -D old new &&
>>
>> Do you need to delete the 'discard' branch here?
> 
> Great catch. I didn't notice it here because even though it was
> reachable (and not mentioned as an extra tip), so didn't appear in the
> cruft pack.

...so we catch things like this automatically.

>>> +		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
>>> +		git show-index <${mtimes%.mtimes}.idx >cruft &&
>>> +		cut -d" " -f2 cruft | sort >cruft.actual &&
>>> +		test_cmp cruft.expect cruft.actual
>>
>> One thing that would be good, as a safety check, is to remove
>> the pack.extraCruftTips config and re-run the repack and be
>> sure that we are pruning the objects previously saved by the
>> hook.
> 
> Good call. I think this amounts to:
> 
> --- 8< ---
> diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
> index 1260e11d41..44852e6925 100755
> --- a/t/t5329-pack-objects-cruft.sh
> +++ b/t/t5329-pack-objects-cruft.sh
> @@ -773,7 +773,7 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT
>  		cruft_new="$(git rev-parse HEAD)" &&
> 
>  		git checkout main &&
> -		git branch -D old new &&
> +		git branch -D discard old new &&
>  		git reflog expire --all --expire=all &&
> 
>  		# mark cruft.old with an mtime that is many minutes
> @@ -802,6 +802,19 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT
>  		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
>  		git show-index <${mtimes%.mtimes}.idx >cruft &&
>  		cut -d" " -f2 cruft | sort >cruft.actual &&
> +		test_cmp cruft.expect cruft.actual &&

If we store "discard=$(git rev-parse discard)" earlier, we can also
check

		test_must_fail git show $discard &&

> +		# Ensure that the "old" objects are removed after
> +		# dropping the pack.extraCruftTips hook.
> +		git config --unset pack.extraCruftTips &&
> +		git repack --cruft --cruft-expiration=now -d &&
> +
> +		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
> +		git show-index <${mtimes%.mtimes}.idx >cruft &&
> +		cut -d" " -f2 cruft | sort >cruft.actual &&
> +
> +		git rev-list --objects --no-object-names $cruft_new >cruft.raw &&

I like how $cruft_new is still kept because its mtime is still
"later than now".

> +		sort cruft.raw >cruft.expect &&
>  		test_cmp cruft.expect cruft.actual
>  	)
>  '
> --- >8 ---
> 
>> It would be helpful to include a test for the multi-value case where
>> multiple scripts are executed and maybe include an "exit 1" in some
>> of them?
> 
> Definitely, that is a great suggestion (especially because it would have
> caught the "if (!ret)" bug that you pointed out above).
> 
> Thanks for a thorough review. I'll wait for other feedback before
> re-rolling, or do so in a couple of days if I haven't heard anything.

Thanks,
-Stolee

  reply	other threads:[~2023-04-26 10:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 17:27 [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips` Taylor Blau
2023-04-20 18:12 ` Junio C Hamano
2023-04-20 19:30   ` Taylor Blau
2023-04-20 19:52     ` Junio C Hamano
2023-04-20 20:48       ` Taylor Blau
2023-04-21  0:10 ` Chris Torek
2023-04-21  2:14   ` Taylor Blau
2023-04-25 19:42 ` Derrick Stolee
2023-04-25 21:25   ` Taylor Blau
2023-04-26 10:52     ` Derrick Stolee [this message]
2023-05-03  0:06       ` Taylor Blau
2023-05-03  0:09 ` [PATCH v2] " Taylor Blau
2023-05-03 14:01   ` Derrick Stolee
2023-05-03 19:59   ` Jeff King
2023-05-03 21:22     ` Taylor Blau
2023-05-05 21:23       ` Jeff King
2023-05-06  0:06         ` Taylor Blau
2023-05-06  0:14           ` Taylor Blau
2023-05-03 21:28     ` Taylor Blau
2023-05-05 21:26       ` Jeff King
2023-05-05 22:13         ` Jeff King
2023-05-06  0:13           ` Taylor Blau
2023-05-06  0:20             ` Taylor Blau
2023-05-06  2:12             ` Jeff King
2023-05-03 22:05 ` [PATCH v3] " Taylor Blau
2023-05-03 23:18   ` Junio C Hamano
2023-05-03 23:42     ` Junio C Hamano
2023-05-03 23:48       ` Taylor Blau
2023-05-03 23:50       ` Taylor Blau
2023-05-05 21:39     ` Jeff King
2023-05-05 22:19   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97d84abc-9b25-9d43-503e-264b33c223d4@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).