git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>, John Cai <johncai86@gmail.com>,
	Robert Coup <robert.coup@koordinates.com>,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter=
Date: Tue, 22 Feb 2022 12:33:04 -0500	[thread overview]
Message-ID: <YhUeUCIetu/aOu6k@nand.local> (raw)
In-Reply-To: <CAP8UFD3U4t-inWC5mZYhybWpjVwkqA7v4hYZ5voBOEJ=+_Y1kQ@mail.gmail.com>

Hi Christian,

I think my objections may be based on a misunderstanding of John and
your original proposal. From reading [1], it seemed to me like a
required step of this proposal was to upload the objects you want to
filter out ahead of time, and then run `git repack -ad --filter=...`.

So my concerns thus far have been around the lack of cohesion between
(1) the filter which describes the set of objects uploaded to the HTTP
server, and (2) the filter used when re-filtering the repository.

If (1) and (2) aren't inverses of each other, then in the case where (2)
leaves behind an object which wasn't caught by (1), we have lost that
object.

If instead the server used in your script at [1] is a stand-in for an
ordinary Git remote, that changes my thinking significantly. See below
for more details:

On Tue, Feb 22, 2022 at 06:11:11PM +0100, Christian Couder wrote:
> > > > I share the same concern as Robert and Stolee do. But I think this issue
> > > > goes deeper than just naming.
> > > >
> > > > Even if we called this `git repack --delete-filter` and only ran it with
> > > > `--i-know-what-im-doing` flag, we would still be leaving repository
> > > > corruption on the table, just making it marginally more difficult to
> > > > achieve.
> > >
> > > My opinion on this is that the promisor object mechanism assumes by
> > > design that some objects are outside a repo, and that this repo
> > > shouldn't care much about these objects possibly being corrupted.
> >
> > For what it's worth, I am fine having a mode of repack which allows us
> > to remove objects that we know are stored by a promisor remote. But this
> > series doesn't do that, so users could easily run `git repack -d
> > --filter=...` and find that they have irrecoverably corrupted their
> > repository.
>
> In some cases we just know the objects we are removing are stored by a
> promisor remote or are replicated on different physical machines or
> both, so you should be fine with this.

I am definitely OK with a convenient way to re-filter your repository
locally so long as you know that the objects you are filtering out are
available via some promisor remote.

But perhaps I have misunderstood what this proposal is for. Reading
through John's original cover letter and the link to your demo script, I
understood that a key part of this was being able to upload the pack of
objects you were about to filter out of your local copy to some server
(not necessarily Git) over HTTP.

My hesitation so far has been based on that understanding. Reading these
patches, I don't see a mechanism to upload objects we're about to
expunge to a promisor remote.

But perhaps I'm misunderstanding: if you are instead assuming that the
existing set of remotes can serve any objects that we deleted, and this
is the way to delete them, then I am OK with that approach. But I think
either way, I am missing some details in the original proposal that
would have perhaps made it easier for me to understand what your goals
are.

In any case, this patch series doesn't seem to correctly set up a
promisor remote for me, since doing the following on a fresh clone of
git.git (after running "make"):

    $ bin-wrappers/git repack -ad --filter=blob:none
    $ bin-wrappers/git fsck

results in many "missing blob" and "missing link" lines out of output.

(FWIW, I think what's missing here is correctly setting up the affected
remote(s) as promisors and indicating that we're now in a filtered
setting when going from a full clone down to a partial one.)

> If you are not fine with this because sometimes a user might use it
> without knowing, then why are you ok with commands deleting refs not
> checking that there isn't a regular repack removing dangling objects?

I'm not sure I totally understand your question, but my general sense
has been "because we typically make it difficult / impossible to remove
reachable objects".

Thanks,
Taylor

[1]: https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/server_demo.txt#L47-52

  reply	other threads:[~2022-02-22 17:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27  1:49 [PATCH 0/2] repack: add --filter= John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 1/2] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 2/2] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-01-27 15:03   ` Derrick Stolee
2022-01-29 19:14     ` John Cai
2022-01-30  8:16       ` Christian Couder
2022-01-30 13:02       ` John Cai
2022-02-09  2:10 ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 1/4] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 2/4] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 3/4] upload-pack: allow missing promisor objects John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 4/4] tests for repack --filter mode John Cai via GitGitGadget
2022-02-17 16:14     ` Robert Coup
2022-02-17 20:36       ` John Cai
2022-02-09  2:27   ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai
2022-02-16 15:39   ` Robert Coup
2022-02-16 21:07     ` John Cai
2022-02-21  3:11       ` Taylor Blau
2022-02-21 15:38         ` Robert Coup
2022-02-21 17:57           ` Taylor Blau
2022-02-21 21:10         ` Christian Couder
2022-02-21 21:42           ` Taylor Blau
2022-02-22 17:11             ` Christian Couder
2022-02-22 17:33               ` Taylor Blau [this message]
2022-02-23 15:40               ` Robert Coup
2022-02-23 19:31               ` Junio C Hamano
2022-02-26 16:01                 ` John Cai
2022-02-26 17:29                   ` Taylor Blau
2022-02-26 20:19                     ` John Cai
2022-02-26 20:30                       ` Taylor Blau
2022-02-26 21:05                         ` John Cai
2022-02-26 21:44                           ` Taylor Blau
2022-02-22 18:52             ` John Cai
2022-02-22 19:35               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhUeUCIetu/aOu6k@nand.local \
    --to=me@ttaylorr.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=robert.coup@koordinates.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).