From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>,
git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
Date: Tue, 25 Aug 2020 12:18:42 -0400 [thread overview]
Message-ID: <c78b3108-b760-d252-9428-6f03549fea11@gmail.com> (raw)
In-Reply-To: <xmqqtuwq1zux.fsf@gitster.c.googlers.com>
On 8/25/2020 11:58 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
>> or "git repack -Ad" is going to pack everything and delete the old
>> packs. So I think we'd want to remove a midx there.
>
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".
>
>> And "git repack -d" I think of as deleting only loose objects that we
>> just packed. But I guess it could also remove a pack that has now been
>> made redundant? That seems like a rare case in practice, but I suppose
>> is possible.
>
> Meaning it can become reality? Yes. Or it already can happen? I
> doubt it.
>
>> E.g., imagine we have a midx that points to packs A and B, and
>> git-repack deletes B. By your logic above, we need to remove the midx
>> because now it points to objects in B which aren't accessible. But by
>> deleting it, could we be deleting the only thing that mentions the
>> objects in A?
>>
>> I _think_ the answer is "no", because we never went all-in on midx and
>> allowed deleting the matching .idx files for contained packs.
>
> Yeah, it has been my assumption that that part of the design would
> never change.
Yes, we should intend to keep the .idx files around forever even when
using the multi-pack-index. The duplicated data overhead is not typically
a real problem.
The one caveat I would consider is if we wanted to let a multi-pack-index
cover thin packs, but that would be a substantial change to the object
database that I do not plan to tackle any time soon.
>> I'm also a little curious how bad it is to have a midx whose pack has
>> gone away. I guess we'd answer queries for "yes, we have this object"
>> even if we don't, which is bad. Though in practice we'd only delete
>> those packs if we have their objects elsewhere.
>
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O. We
> need to ask A.idx about O to locate it.
>
> That sounds brittle. I am not sure our error fallback is all that
> patient.
The best I can see is that prepare_midx_pack() will return 1 if the
pack no longer exists, and this would cause a die("error preparing
packfile from multi-pack-index") in nth_midxed_pack_entry(), killing
the following stack trace:
+ find_pack_entry():packfile.c
+ fill_midx_entry():midx.c
+ nth_midxed_pack_entry():midx.c
Perhaps that die() is a bit over-zealous since we could return 1
through this stack and find_pack_entry() could continue searching
for the object in the packed_git list. However, it could start
returning false _negatives_ if there were duplicates of the object
in the multi-pack-index but only the latest copy was deleted (and
the object does not appear in a pack-file outside of the multi-
pack-index).
Thanks,
-Stolee
next prev parent reply other threads:[~2020-08-25 16:18 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-25 2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25 2:26 ` Jeff King
2020-08-25 2:37 ` Taylor Blau
2020-08-25 13:14 ` Derrick Stolee
2020-08-25 14:41 ` Taylor Blau
2020-08-25 15:14 ` Derrick Stolee
2020-08-25 15:42 ` Taylor Blau
2020-08-25 16:56 ` Jeff King
2020-08-25 15:58 ` Junio C Hamano
2020-08-25 16:08 ` Taylor Blau
2020-08-25 16:18 ` Derrick Stolee [this message]
2020-08-25 17:34 ` Jeff King
2020-08-25 17:22 ` Jeff King
2020-08-25 18:05 ` Junio C Hamano
2020-08-25 18:27 ` Jeff King
2020-08-25 22:45 ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
2020-08-25 23:09 ` Taylor Blau
2020-08-25 23:22 ` Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
2020-08-26 1:24 ` Eric Sunshine
2020-08-26 7:55 ` Johannes Schindelin
2020-08-26 16:27 ` Junio C Hamano
2020-08-26 1:17 ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 1:28 ` Eric Sunshine
2020-08-26 1:42 ` Junio C Hamano
2020-08-26 16:08 ` Junio C Hamano
2020-08-26 16:28 ` Junio C Hamano
2020-08-26 8:02 ` Johannes Schindelin
2020-08-26 1:17 ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
2020-08-26 1:19 ` Junio C Hamano
2020-08-26 8:06 ` Johannes Schindelin
2020-08-26 16:30 ` Junio C Hamano
2020-08-28 2:13 ` Johannes Schindelin
2020-08-28 22:03 ` Junio C Hamano
2020-08-31 9:59 ` Johannes Schindelin
2020-08-31 17:45 ` Junio C Hamano
2020-12-20 15:25 ` Johannes Schindelin
2020-12-21 22:24 ` Junio C Hamano
2020-12-30 5:30 ` Johannes Schindelin
2020-08-26 8:09 ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
2020-08-26 16:45 ` Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
2020-08-26 19:46 ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 21:37 ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
2020-08-26 22:25 ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
2020-08-27 4:21 ` Jeff King
2020-08-27 4:30 ` Junio C Hamano
2020-08-27 4:31 ` Eric Sunshine
2020-08-27 4:44 ` Jeff King
2020-08-27 5:03 ` Eric Sunshine
2020-08-27 5:25 ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
2020-08-27 5:56 ` Eric Sunshine
2020-08-27 15:31 ` Junio C Hamano
2020-08-27 4:13 ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
2020-08-27 4:22 ` Jeff King
2020-08-27 4:31 ` Junio C Hamano
2020-08-27 4:14 ` Jeff King
2020-08-27 15:34 ` Junio C Hamano
2020-08-31 22:56 ` Junio C Hamano
2020-09-01 4:49 ` Jeff King
2020-09-01 16:11 ` Junio C Hamano
2020-08-27 0:57 ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
2020-08-27 1:22 ` Junio C Hamano
2020-08-28 9:14 ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
2020-08-28 22:45 ` Junio C Hamano
2020-08-25 7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
2020-08-25 12:45 ` Derrick Stolee
2020-08-25 14:45 ` Taylor Blau
2020-08-25 16:04 ` [PATCH v2] " Taylor Blau
2020-08-26 20:51 ` Derrick Stolee
2020-08-26 20:54 ` Junio C Hamano
2020-08-25 16:47 ` [PATCH] " Jeff King
2020-08-25 17:10 ` Derrick Stolee
2020-08-25 17:29 ` Jeff King
2020-08-25 17:34 ` Taylor Blau
2020-08-25 17:42 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c78b3108-b760-d252-9428-6f03549fea11@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).