git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: John Cai <johncai86@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Christian Couder <christian.couder@gmail.com>,
	Taylor Blau <me@ttaylorr.com>,
	Robert Coup <robert.coup@koordinates.com>,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter=
Date: Sat, 26 Feb 2022 11:01:46 -0500	[thread overview]
Message-ID: <36CA51FE-8B7F-4D08-A91D-95D8F76606C9@gmail.com> (raw)
In-Reply-To: <xmqqv8x5v0qc.fsf@gitster.g>

Thank you for everyone's feedback. Really appreciate the collaboration!

On 23 Feb 2022, at 14:31, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
>
>>> For what it's worth, I am fine having a mode of repack which allows us
>>> to remove objects that we know are stored by a promisor remote. But this
>>> series doesn't do that, so users could easily run `git repack -d
>>> --filter=...` and find that they have irrecoverably corrupted their
>>> repository.
>>
>> In some cases we just know the objects we are removing are stored by a
>> promisor remote or are replicated on different physical machines or
>> both, so you should be fine with this.
>
> So, we need to decide if an object we have that is outside the
> narrowed filter definition was (and still is, but let's keep the
> assumption the whole lazy clone mechanism makes: promisor remotes
> will never shed objects that they once served) available at the
> promisor remote, but I suspect we have too little information to
> reliably do so.  It is OK to assume that objects in existing packs
> taken from the promisor remotes and everything reachable from them
> (but missing from our object store) will be available to us from
> there.  But if we see an object that is outside of _new_ filter spec
> (e.g. you fetched with "max 100MB", now you are refiltering with
> "max 50MB", narrowing the spec, and you need to decide for an object
> that weigh 70MB), can we tell if that can be retrieved from the
> promisor or is it unique to our repository until we push it out?  I
> am not sure.  For that matter, do we even have a way to compare if
> the new filter spec is a subset, a superset, or neither, of the
> original filter spec?

let me try to summarize (perhaps over simplify) the main concern folks have
with this feature, so please correct me if I'm wrong!

As a user, if I apply a filter that ends up deleting objects that it turns
out do not exist anywhere else, then I have irrecoverably corrupted my
repository.

Before git allows me to delete objects from my repository, it should be pretty
certain that I have path to recover those objects if I need to.

Is that correct? It seems to me that, put another way, we don't want to give
users too much rope to hang themselves.

I can see why we would want to do this. In this case, there have been a couple
of alternative ideas proposed throughout this thread that I think are viable and
I wanted to get folks thoughts.

1. split pack file - (Rob gave this idea and Taylor provided some more detail on
   how using pack-objects would make it fairly straightforward to implement)

when a user wants to apply a filter that removes objects from their repository,
split the packfile into one containing objects that are filtered out, and
another packfile with objects that remain.

pros: simple to implement
cons: does not address the question "how sure am I that the objects I want to
filter out of my repository exist on a promsior remote?"

2. check the promisor remotes to see if they contain the objects that are about
   to get deleted. Only delete objects that we find on promisor remotes.

pros: provides assurance that I have access to objects I am about to delete from
a promsior remote.
cons: more complex to implement. [*]

Out of these two, I like 2 more for the aforementioned pros.

* I am beginning to look into how fetches work and am still pretty new to the
codebase so I don't know if this is even feasible, but I was thinking perhaps
we could write a function that fetches with a --filter and create a promisor
packfile containing promisor objects (this operaiton would have to somehow
ignore the presence of the actual objects in the repository). Then, we would
have a record of objects we have access to. Then, repack --filter can remove
only the objects contained in this promisor packfile.

>
>> If you are not fine with this because sometimes a user might use it
>> without knowing, then why are you ok with commands deleting refs not
>> checking that there isn't a regular repack removing dangling objects?
>
> Sorry, I do not follow this argument.  Your user may do "branch -D"
> because the branch deleted is no longer needed, which may mean that
> objects only reachable from the deleted branch are no longer needed.
> I do not see what repack has anything to do with that.  As long as
> the filter spec does not change (in other words, before this series
> is applied), the repack that discards objects that are known to be
> reachable from objects in packs retrieved from promisor remote, the
> objects that are no longer reachable may be removed and that will
> not lose objects that we do not know to be retrievable from there
> (which is different from objects that we know are unretrievable).
> But with filter spec changing after the fact, I am not sure if that
> is safe.  IOW, "commands deleting refs" may have been OK without
> this series, but this series may be what makes it not OK, no?
>
> Puzzled.

  reply	other threads:[~2022-02-26 16:02 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27  1:49 [PATCH 0/2] repack: add --filter= John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 1/2] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 2/2] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-01-27 15:03   ` Derrick Stolee
2022-01-29 19:14     ` John Cai
2022-01-30  8:16       ` Christian Couder
2022-01-30 13:02       ` John Cai
2022-02-09  2:10 ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 1/4] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 2/4] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 3/4] upload-pack: allow missing promisor objects John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 4/4] tests for repack --filter mode John Cai via GitGitGadget
2022-02-17 16:14     ` Robert Coup
2022-02-17 20:36       ` John Cai
2022-02-09  2:27   ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai
2022-02-16 15:39   ` Robert Coup
2022-02-16 21:07     ` John Cai
2022-02-21  3:11       ` Taylor Blau
2022-02-21 15:38         ` Robert Coup
2022-02-21 17:57           ` Taylor Blau
2022-02-21 21:10         ` Christian Couder
2022-02-21 21:42           ` Taylor Blau
2022-02-22 17:11             ` Christian Couder
2022-02-22 17:33               ` Taylor Blau
2022-02-23 15:40               ` Robert Coup
2022-02-23 19:31               ` Junio C Hamano
2022-02-26 16:01                 ` John Cai [this message]
2022-02-26 17:29                   ` Taylor Blau
2022-02-26 20:19                     ` John Cai
2022-02-26 20:30                       ` Taylor Blau
2022-02-26 21:05                         ` John Cai
2022-02-26 21:44                           ` Taylor Blau
2022-02-22 18:52             ` John Cai
2022-02-22 19:35               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36CA51FE-8B7F-4D08-A91D-95D8F76606C9@gmail.com \
    --to=johncai86@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=robert.coup@koordinates.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).