git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: John Cai <johncai86@gmail.com>,
	Robert Coup <robert.coup@koordinates.com>,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter=
Date: Tue, 22 Feb 2022 18:11:11 +0100	[thread overview]
Message-ID: <CAP8UFD3U4t-inWC5mZYhybWpjVwkqA7v4hYZ5voBOEJ=+_Y1kQ@mail.gmail.com> (raw)
In-Reply-To: <YhQHYQ9b9bYYv10r@nand.local>

On Mon, Feb 21, 2022 at 10:42 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Mon, Feb 21, 2022 at 10:10:15PM +0100, Christian Couder wrote:
> > > > Also, to have more protection we can either
> > > >
> > > > 1. add a config value that needs to be set to true for repack to remove
> > > > objects (repack.allowDestroyFilter).
> >
> > I don't think it's of much value. We don't have such config values for
> > other possibly destructive operations.
> >
> > > > 2. --filter is dry-run by default and prints out objects that would have been removed,
> > > > and it has to be combined with another flag --destroy in order for it to actually remove
> > > > objects from the odb.
> >
> > I am not sure it's of much value either compared to naming it
> > --filter-destroy. It's likely to just make things more difficult for
> > users to understand.
>
> On this and the above, I agree with Christian.
>
> > > I share the same concern as Robert and Stolee do. But I think this issue
> > > goes deeper than just naming.
> > >
> > > Even if we called this `git repack --delete-filter` and only ran it with
> > > `--i-know-what-im-doing` flag, we would still be leaving repository
> > > corruption on the table, just making it marginally more difficult to
> > > achieve.
> >
> > My opinion on this is that the promisor object mechanism assumes by
> > design that some objects are outside a repo, and that this repo
> > shouldn't care much about these objects possibly being corrupted.
>
> For what it's worth, I am fine having a mode of repack which allows us
> to remove objects that we know are stored by a promisor remote. But this
> series doesn't do that, so users could easily run `git repack -d
> --filter=...` and find that they have irrecoverably corrupted their
> repository.

In some cases we just know the objects we are removing are stored by a
promisor remote or are replicated on different physical machines or
both, so you should be fine with this.

If you are not fine with this because sometimes a user might use it
without knowing, then why are you ok with commands deleting refs not
checking that there isn't a regular repack removing dangling objects?

Also note that people who want to remove objects using a filter can
already do it by cloning with a filter and then replacing the original
packs with the packs from the clone. So refusing this new feature is
just making things more cumbersome.

> I think that there are some other reasonable directions, though. One
> which Robert and I discussed was making it possible to split a
> repository into two packs, one which holds objects that match some
> `--filter` criteria, and one which holds the objects that don't match
> that filter.

I am ok with someone implementing this feature, but if an option that
actually deletes the filtered objects is rejected then such a feature
will be used with some people just deleting one of the resulting packs
(and they might get it wrong), so I don't think any real safety will
be gained.

> Another option would be to prune the repository according to objects
> that are already made available by a promisor remote.

If the objects have just been properly transferred to the promisor
remote, the check will just waste resources.

> An appealing quality about the above two directions is that the first
> doesn't actually remove any objects, just makes it easier to push a
> whole pack of unwanted objects off to a promsior remote. The second
> prunes the repository according to objects that are already made
> available by the promisor remote. (Yes, there is a TOCTOU race there,
> too, but it's the same prune-while-pushing race that Git already has
> today).
>
> > I am not against a name and some docs that strongly state that users
> > should be very careful when using such a command, but otherwise I
> > think such a command is perfectly ok. We have other commands that by
> > design could lead to some objects or data being lost.
>
> I can think of a handful of ways to remove objects which are unreachable
> from a repository, but I am not sure we have any ways to remove objects
> which are reachable.

Cloning with a filter already does that. It's by design in the
promisor object and partial clone mechanisms that reachable objects
are removed. Having more than one promisor remote, which is an
existing mechanism, means that it's just wasteful to require all the
remotes to have all the reachable objects, so how could people easily
set up such remotes? Why make it unnecessarily hard and forbid a
straightforward way?

  reply	other threads:[~2022-02-22 17:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27  1:49 [PATCH 0/2] repack: add --filter= John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 1/2] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 2/2] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-01-27 15:03   ` Derrick Stolee
2022-01-29 19:14     ` John Cai
2022-01-30  8:16       ` Christian Couder
2022-01-30 13:02       ` John Cai
2022-02-09  2:10 ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 1/4] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 2/4] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 3/4] upload-pack: allow missing promisor objects John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 4/4] tests for repack --filter mode John Cai via GitGitGadget
2022-02-17 16:14     ` Robert Coup
2022-02-17 20:36       ` John Cai
2022-02-09  2:27   ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai
2022-02-16 15:39   ` Robert Coup
2022-02-16 21:07     ` John Cai
2022-02-21  3:11       ` Taylor Blau
2022-02-21 15:38         ` Robert Coup
2022-02-21 17:57           ` Taylor Blau
2022-02-21 21:10         ` Christian Couder
2022-02-21 21:42           ` Taylor Blau
2022-02-22 17:11             ` Christian Couder [this message]
2022-02-22 17:33               ` Taylor Blau
2022-02-23 15:40               ` Robert Coup
2022-02-23 19:31               ` Junio C Hamano
2022-02-26 16:01                 ` John Cai
2022-02-26 17:29                   ` Taylor Blau
2022-02-26 20:19                     ` John Cai
2022-02-26 20:30                       ` Taylor Blau
2022-02-26 21:05                         ` John Cai
2022-02-26 21:44                           ` Taylor Blau
2022-02-22 18:52             ` John Cai
2022-02-22 19:35               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD3U4t-inWC5mZYhybWpjVwkqA7v4hYZ5voBOEJ=+_Y1kQ@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=robert.coup@koordinates.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).